big-data topic

List big-data repositories

starrocks

8.7k
Stars
1.8k
Forks
Watchers

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.

carbondata

1.4k
Stars
702
Forks
Watchers

High performance data store solution

opendata.cern.ch

642
Stars
141
Forks
Watchers

Source code for the CERN Open Data portal

drill

1.9k
Stars
979
Forks
Watchers

Apache Drill is a distributed MPP query layer for self describing data

OnlineStats.jl

822
Stars
61
Forks
Watchers

⚡ Single-pass algorithms for statistics

awesome-data-catalogs

604
Stars
48
Forks
Watchers

📙 Awesome Data Catalogs and Observability Platforms.

aut

133
Stars
33
Forks
Watchers

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

SparkLearning

620
Stars
66
Forks
Watchers

A comprehensive Spark guide collated from multiple sources that can be referred to learn more about Spark or as an interview refresher.

Spark-with-Python

324
Stars
259
Forks
Watchers

Fundamentals of Spark with Python (using PySpark), code examples

kafka-streams

825
Stars
111
Forks
Watchers

equivalent to kafka-streams :octopus: for nodejs :sparkles::turtle::rocket::sparkles: