big-data topic

List big-data repositories

SynapseML

5.0k
Stars
819
Forks
Watchers

Simple and Distributed Machine Learning

spark

38.6k
Stars
28.0k
Forks
Watchers

Apache Spark - A unified analytics engine for large-scale data processing

koalas

3.3k
Stars
354
Forks
Watchers

Koalas: pandas API on Apache Spark

awesome-scalability

64.1k
Stars
6.5k
Forks
Watchers

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

vespa

5.4k
Stars
577
Forks
Watchers

AI + Data, online. https://vespa.ai

beam

7.6k
Stars
4.2k
Forks
Watchers

Apache Beam is a unified programming model for Batch and Streaming data processing.

ClickHouse

36.1k
Stars
6.7k
Forks
Watchers

ClickHouse® is a real-time analytics DBMS

trino

9.7k
Stars
2.8k
Forks
Watchers

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

pachyderm

6.1k
Stars
567
Forks
Watchers

Data-Centric Pipelines and Data Versioning

catboost

7.8k
Stars
1.2k
Forks
Watchers

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computa...