big-data topic

List big-data repositories

SynapseML

5.0k

Stars

819

Forks

Watchers

Simple and Distributed Machine Learning

spark

38.6k

Stars

28.0k

Forks

Watchers

Apache Spark - A unified analytics engine for large-scale data processing

koalas

3.3k

Stars

354

Forks

Watchers

Koalas: pandas API on Apache Spark

awesome-scalability

64.1k

Stars

6.5k

Forks

Watchers

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

vespa

5.4k

Stars

577

Forks

Watchers

AI + Data, online. https://vespa.ai

beam

7.6k

Stars

4.2k

Forks

Watchers

Apache Beam is a unified programming model for Batch and Streaming data processing.

ClickHouse

36.1k

Stars

6.7k

Forks

Watchers

ClickHouse® is a real-time analytics DBMS

trino

9.7k

Stars

2.8k

Forks

Watchers

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

pachyderm

6.1k

Stars

567

Forks

Watchers

Data-Centric Pipelines and Data Versioning

catboost

7.8k

Stars

1.2k

Forks

Watchers

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computa...

categorical-features