data-algorithms-with-spark icon indicating copy to clipboard operation
data-algorithms-with-spark copied to clipboard

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Data Algorithms with Spark by Mahmoud Parsian

"... This book will be a great resource for
both readers looking to implement existing
algorithms in a scalable fashion and readers
who are developing new, custom algorithms
using Spark. ..."

Dr. Matei Zaharia
Original Creator of Apache Spark

FOREWORD by Dr. Matei Zaharia

Data Algorithms with Spark by Mahmoud Parsian

Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)

Author: Mahmoud Parsian

Goal of this book: enable writing efficient & simpler PySpark code for data algorithms using Spark



Github Chapter Solutions


Software:

Spark Python Scala Java
Apache Spark 3.2.0 Python 3.7.2 Scala 2.13 Java 8

Table of Contents

Chapter Title
Bonus Chapters
  • Tutorials: RDDs and DataFrames
  • UDF, Partitioning, TF-IDF, Correlation, K-mers, anagrams, ...
Chapter 1 Introduction to Data Algorithms
Chapter 2 Transformations in Action
Chapter 3 Mapper Transformations
Chapter 4 Reductions in Spark
Chapter 5 Partitioning Data
Chapter 6 Graph Algorithms
Chapter 7 Interacting with External Data Sources
Chapter 8 Ranking Algorithms
Chapter 9 Fundamental Data Design Patterns
Chapter 10 Common Data Design Patterns
Chapter 11 Join Design Patterns
Chapter 12 Feature Engineering in PySpark

Data Algorithms with Spark Data Algorithms with Spark