big-data topic
starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
carbondata
High performance data store solution
opendata.cern.ch
Source code for the CERN Open Data portal
drill
Apache Drill is a distributed MPP query layer for self describing data
OnlineStats.jl
⚡ Single-pass algorithms for statistics
awesome-data-catalogs
📙 Awesome Data Catalogs and Observability Platforms.
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
SparkLearning
A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.
Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples
kafka-streams
equivalent to kafka-streams :octopus: for nodejs :sparkles::turtle::rocket::sparkles: