data-pipelines

Star

Here are 259 public repositories matching this topic...

apache / airflow

Star

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Updated May 10, 2025
Python

pathwaycom / pathway

Star

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

python rust streaming real-time kafka etl machine-learning-algorithms stream-processing data-analytics dataflow data-processing data-pipelines batch-processing pathway iot-analytics etl-framework time-series-analysis

Updated May 10, 2025
Python

apache / dolphinscheduler

Star

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

workflow airflow job-scheduler orchestration cloud-native task-scheduler data-pipelines azkaban workflow-orchestration workflow-schedule powerful-data-pipelines

Updated May 7, 2025
Java

dagster-io / dagster

Star

An orchestration platform for the development, production, and observation of data assets.

python metadata workflow data-science etl analytics scheduler orchestration data-engineering data-integration data-pipelines workflow-automation mlops dagster data-orchestrator

Updated May 10, 2025
Python

Unstructured-IO / unstructured

Star

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Updated May 8, 2025
HTML

mage-ai / mage-ai

Star

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated May 9, 2025
Python

infinyon / fluvio

Star

🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.

rust distributed-systems streaming real-time serverless webassembly data-flow stream-processing data-analytics data-integration cloud-native data-pipelines stateful streaming-data stream-processing-engine event-driven-architecture streaming-analytics streaming-data-processing streaming-data-pipelines

Updated May 10, 2025
Rust

orchest / orchest

Star

Build data pipelines, the easy way 🛠️

python docker kubernetes data-science machine-learning airflow cloud deployment jupyter etl ide pipelines self-hosted jupyterlab notebooks data-pipelines dag etl-pipeline orchest

Updated Jun 6, 2023
TypeScript

Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.

python open-source data ai analytics vscode data-visualization gpt copilot data-pipelines schema-management data-infrastructure analytics-engineering data-sdk llm data-applications

Updated May 9, 2025
Python

elementary-data / elementary

Star

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated May 8, 2025
HTML

meltano / meltano

Star

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.