data-algorithms-with-spark
                                
                                
                                
                                    data-algorithms-with-spark copied to clipboard
                            
                            
                            
                        O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Data Algorithms with Spark by Mahmoud Parsian
| 
 | 
"... This  book  will be a  great resource for  both readers looking to implement existing algorithms in a scalable fashion and readers who are developing new, custom algorithms using Spark. ..." Dr. Matei Zaharia Original Creator of Apache Spark FOREWORD by Dr. Matei Zaharia  | 
Data Algorithms with Spark by Mahmoud Parsian
Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)
Author: Mahmoud Parsian
Goal of this book: enable writing efficient & simpler PySpark code for data algorithms using Spark
- 
This new O'Reilly book is the successor Edition of Data Algorithms (published by O'Reilly)
 - 
This book uses PySpark (much simpler and readable)
 - 
@OReillyMedia: Data Algorithms with Spark, By @mahmoudparsian
 - 
Autor Contact: [
 Email ]  [  
 Mahmoud Parsian @LinkedIn ][  
 Mahmoud Parsian @GitHub ] 
Github Chapter Solutions
- 
This GitHub repository will host all source code and scripts for Data Algorithms with Spark
 - 
Chapter solutions are provided in PySpark and Scala
- PySpark solutions are provided by Mahmoud Parsian
 - Scala solutions are provided by Deepak Kumar and Biman Mandal
 
 
Software:
| Spark | Python | Scala | Java | 
|---|---|---|---|
| Apache Spark 3.2.0 | Python 3.7.2 | Scala 2.13 | Java 8 | 
Table of Contents
| Chapter | Title | 
|---|---|
| Bonus Chapters | 
  | 
| Chapter 1 | Introduction to Data Algorithms | 
| Chapter 2 | Transformations in Action | 
| Chapter 3 | Mapper Transformations | 
| Chapter 4 | Reductions in Spark | 
| Chapter 5 | Partitioning Data | 
| Chapter 6 | Graph Algorithms | 
| Chapter 7 | Interacting with External Data Sources | 
| Chapter 8 | Ranking Algorithms | 
| Chapter 9 | Fundamental Data Design Patterns | 
| Chapter 10 | Common Data Design Patterns | 
| Chapter 11 | Join Design Patterns | 
| Chapter 12 | Feature Engineering in PySpark |