Apache Spark topic

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

List Apache Spark repositories

data-science-ipython-notebooks

26.6k
Stars
7.7k
Forks
Watchers

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...

dev-setup

6.1k
Stars
1.1k
Forks
Watchers

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduc...

pyspark-cheatsheet

355
Stars
120
Forks
Watchers

🐍 Quick reference guide to common patterns & functions in PySpark.

prosto

90
Stars
4
Forks
Watchers

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Movies-Analytics-in-Spark-and-Scala

90
Stars
52
Forks
Watchers

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

big_data_architect_skills

458
Stars
170
Forks
Watchers

一个大数据架构师应该掌握的技能

airbnb-spark-thrift

43
Stars
16
Forks
Watchers

A library for loadling Thrift data into Spark SQL

ammonite-spark

115
Stars
16
Forks
Watchers

Run spark calculations from Ammonite

DigitRecognizer

29
Stars
24
Forks
Watchers

Java Convolutional Neural Network example for Hand Writing Digit Recognition