dask-workshop
dask-workshop copied to clipboard
Dask Workshop
These materials provide a brief hands-on introduction to the parallel computing system, Dask. They are intended to be delivered over a 90 minute session and cover the following topics.
- Parallelize existing code with dask.delayed
- Set up the dask.distributed system on your local laptop
- Use Dask.dataframe on time series data
These topics are far from comprehensive, but have been chosen to give a flavor for what can be done with Dask.
These materials are presented as Jupyter notebooks, which should be available within this directory.
To get started download this repository:
git clone https://github.com/mrocklin/dask-workshop
Create a conda environment with the following commands:
conda create -n dask-workshop -c conda-forge python=3 dask distributed jupyter bokeh feather-format python-graphviz matplotlib tornado=4.4
source activate dask-workshop
pip install pandas_datareader
Then start a Jupyter notebook server and begin with the first notebook:
jupyter notebook
Note: feather-format is not available in Python 2 on Windows.
Note: tornado 4.5 and bokeh 0.12.5 have known compatibility issues.
After Finishing
This tutorial covered dask.dataframe and dask.delayed for simple tabular computations. This is a common and important case, but is only one of many applications for which Dask is used. If you are interested in arrays, machine learning, asynchronous computations, etc. you may wish to peruse the documentation further:
- Main Documentation
- Examples
- Distributed scheduler (also most of the asynchronous docs)
If you want to try Dask on a cluster on Amazon or Google hardware then you might try one of the following projects: