intro-to-colab-pyspark-emr
intro-to-colab-pyspark-emr copied to clipboard
A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs,...
Colab and PySpark
Everything PySpark.
Once you complete this notebook, you should be able to write pyspark programs in an efficent way. The ideal way to use this is by going through the examples given and then trying them on Colab. At the end there are a few hands on questions which you can use to evaluate yourself. The objective of the notebook is to:
- Give a proper understanding about the different PySpark functions available.
- A short introduction to Google Colab, as that is the platform on which this notebook is written on.
I have made an html version of the same, which you can easily access here.