big-data topic

List big-data repositories

metorikku

576
Stars
151
Forks
Watchers

A simplified, lightweight ETL Framework based on Apache Spark

poseidon

2.0k
Stars
428
Forks
Watchers

A search engine which can hold 100 trillion lines of log data.

geni

278
Stars
28
Forks
Watchers

A Clojure dataframe library that runs on Spark

DataflowJavaSDK

855
Stars
326
Forks
Watchers

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

succinct

278
Stars
76
Forks
Watchers

Enabling queries on compressed data.

calcite

4.4k
Stars
2.3k
Forks
Watchers

Apache Calcite

keyvi

236
Stars
42
Forks
Watchers

Keyvi - the key value index. It is an in-memory FST-based data structure highly optimized for size and lookup performance.

keyvi

178
Stars
38
Forks
Watchers

Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.

fili

172
Stars
96
Forks
Watchers

Easily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.

spark-movie-lens

814
Stars
400
Forks
Watchers

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset