big-data topic
SynapseML
Simple and Distributed Machine Learning
spark
Apache Spark - A unified analytics engine for large-scale data processing
koalas
Koalas: pandas API on Apache Spark
awesome-scalability
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
beam
Apache Beam is a unified programming model for Batch and Streaming data processing.
ClickHouse
ClickHouse® is a real-time analytics DBMS
trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
pachyderm
Data-Centric Pipelines and Data Versioning
catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computa...