Data-Infra-Projects
Data-Infra-Projects copied to clipboard
List of some interesting projects
trafficstars
Data-Infra-Projects
This is an attempt to list out all the interesting projects.
It is intended for anyone designing modern large scale architectures and need to choose tools/technoglogies/frameworks. The purpose is to help in making that choices with resources like comparisons/use-cases/features/maturity or really anything that helps in making an informed decision.
Abstractions
Distributed Coordination
This are implementations/libraries to help write distributed applications which require some form of coordination.
Infrastructure Management
comparisons
File Systems
Distributed Databases
Infrastrcuture Logging/Monitoring
Infrastructure Helpers
MultiCloud/CrossCloud utilities
Virtualization
Virtualization++
Generalized Data Processing
comparisons
- Tez vs Dryad
- Hadoop vs Spark - Too many differences, no good link.
Largescale Distributed ML
pub-sub / messaging
Data Ingest
Data change management
Graph Storing and/or Processing
SQL Engines
Stream Processing
Security
Performance Analysis
Workflow engines/DAG-executors/Pipelines
Comparisons
Configuration Management
Service Discovery
Comparison
Testing
Visualization
- White Elephent
- Ambrose
- Lipstick
- Hue - Hadoop Web UI
- Inviso
- Timberlake
Libraries
- Zoie
- Norbert - cluster manager and networking layer built on top of Zookeeper.
- Okapi - Large-scale ML & graph analytics on Giraph
- Scalding - A Scala API for Cascading
- SummingBird - Streaming MapReduce with Scalding and Storm
- Curator - set of Java libraries that make using Apache ZooKeeper much easier
- Turbine - Low latency high throughput aggregator for real time streams
- DataFu - Collection of MapReduce lib
- Twill (Previsously known as Weave) - YARN application writing lib