Designed and implemented a SQLite-based data pipeline that automates ingestion of airline delay data, enforces ingestion tracking, and produces analytics-ready SQL views for downstream visualization tools such as Tableau and other BI platforms. You can see the project repo here.
This project demonstrates end-to-end database design and automation using SQL and Python. The pipeline ingests raw airline delay files into a structured SQLite database, tracks file-level ingestion state, and generates reusable SQL views optimized for analysis and visualization.
The primary goal of this project is to showcase practical SQL skills, including database schema design, automated data ingestion, and transformation of raw data into visualization-ready tables that can be consumed directly by business intelligence tools.
A relational SQLite database was designed with separate tables for raw airline delay records and file
ingestion tracking. The ingestion pipeline automatically detects new data files placed in a designated
data/raw directory and appends their contents to the main airlines table.
An ingestion log table ensures idempotency by preventing duplicate inserts, while SQL views are used to pre-aggregate and reshape the data into analysis-friendly formats suitable for visualization tools such as Tableau.
The pipeline enables reliable, repeatable ingestion of airline delay data and produces consistent, visualization-ready datasets without requiring manual data cleaning or transformation.
This project illustrates how thoughtful database design and SQL-based transformations can simplify downstream analytics workflows. By pushing data cleaning and aggregation into the database layer, visualization tools can operate on stable, well-defined tables and views. The end data product can then be used into any visualization software such as Tableau or Power BI. A use of this can be found using Tableau Public: Here.
The use of an ingestion tracking table reflects real-world data engineering practices, ensuring that the pipeline can be safely rerun as new data becomes available.
This architecture is applicable to any recurring data ingestion problem, including operational metrics, financial reporting, and ETL pipelines where new data files arrive incrementally and must be integrated into an existing database for analysis.
Future enhancements could include migrating to a production-grade database (e.g., PostgreSQL), adding data validation checks, supporting incremental updates, and integrating automated reporting or dashboard refresh workflows.
The project can be extended to support additional data sources, more complex SQL transformations, and tighter integration with visualization platforms. The modular design allows the ingestion and transformation logic to be reused for similar database-driven analytics projects.