awesome-data-engineering feat: transform awesome-data-engineering into definitive 2024-2025 resource

Major improvements:

README Transformation:

Reorganized by data lifecycle (ingestion → storage → transformation → orchestration → processing → quality → governance → activation → visualization)
Fixed all broken markdown syntax (removed spaces in link formatting)
Added modern data stack tools (2020-2025):
- Data Ingestion: Airbyte, Meltano, dlt, Redpanda
- Data Transformation: dbt, SQLMesh, Polars
- Orchestration: Dagster, Prefect, Kestra, Mage
- Data Lakes: Apache Iceberg, Delta Lake, Apache Hudi, XTable
- Lakehouse: Unity Catalog, Apache Polaris, Nessie
- Data Quality: Great Expectations, Soda, elementary-data
- Data Observability: Monte Carlo, OpenMetadata
- Data Catalogs: DataHub, OpenMetadata, Amundsen
- Reverse ETL: Census, Hightouch, Grouparoo
- Semantic Layer: Cube, dbt Semantic Layer
- Embedded Analytics: DuckDB, MotherDuck
Added new critical categories:
- Data Quality & Observability
- Data Discovery & Governance
- Reverse ETL
- Cloud Data Warehouses (separated from general storage)
- Data Lakes & Lakehouses (with table formats)
- Semantic Layer / Metrics Layer
Enhanced all descriptions to be action-oriented and clear
Improved visual hierarchy with proper heading structure
Updated cloud data warehouses section (Snowflake, BigQuery, Databricks SQL, etc.)
Added modern serialization formats (Arrow, MessagePack, FlatBuffers)
Expanded time-series databases (TimescaleDB, QuestDB, VictoriaMetrics)
Updated streaming section with modern tools (RisingWave, ksqlDB, Materialize)
Added dashboarding frameworks (Streamlit, Dash, Gradio, Panel)
Refreshed infrastructure section with modern IaC and monitoring tools
Added table of contents with proper anchor links
Removed outdated or deprecated tools
Added "Last updated" timestamp

Contributing Guidelines Enhancement:

Established clear philosophy of curation over comprehension
Defined quality standards for tool inclusion
Added format requirements with good/bad examples
Created detailed submission guidelines
Specified what to include vs. what to exclude
Outlined PR process and quality review criteria
Added guidance on updating existing entries

Impact: This transforms the list from a dated collection into the definitive, well-curated resource for data engineers in 2024-2025. Every tool is production-ready, actively maintained, and represents current best practices.

Nov 16 '25 07:11 duyet

Hi @duyet, Thank you for the contribution. I like where you are wanting to take this repo, but it does feel like a large jump. There are also conflicts that need to be resolved before I can merge. If you would please explain more about your intentions and desired end result I think we can get to a point where this makes the repo better. If this MR was auto generated and you don't have a stake in it then I will likely take some of these ideas and implement them manually.

Nov 30 '25 13:11 vordimous

@igorbarinov Do you have any opinions here?

Nov 30 '25 21:11 vordimous