Apache Spark topic

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

List Apache Spark repositories

data-science-ipython-notebooks

26.6k

Stars

7.7k

Forks

Watchers

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...

donnemartin

aws

big-data

caffe

data-science

dev-setup

6.1k

Stars

1.1k

Forks

Watchers

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduc...

donnemartin

android-development

aws

bash

cli

pyspark-cheatsheet

355

Stars

120

Forks

Watchers

🐍 Quick reference guide to common patterns & functions in PySpark.

kevinschaich

cheat

cheatsheet

cheatsheets

data

prosto

Stars

Forks

Watchers

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

asavinov

business-intelligence

data-preparation

data-preprocessing

data-processing