Apache Spark topic
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
dev-setup
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduc...
pyspark-cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
big_data_architect_skills
一个大数据架构师应该掌握的技能
airbnb-spark-thrift
A library for loadling Thrift data into Spark SQL
ammonite-spark
Run spark calculations from Ammonite
DigitRecognizer
Java Convolutional Neural Network example for Hand Writing Digit Recognition