python-spark-tutorial issues

1

### Issue: The`ndash` characters in `word_count.txt` cause an error when following the "Run your first Spark Job" tutorial. There are only two occurences of this character here: "`from 1913–74.`" and...

HarryCaveMan

add init.py to commons package

imports from commons.Utils on first two exercises won't work for me without this. Error was: ValueError: Attempted relative import in non-package

manuhortet

No ansible/vagrant files for local development on a dedicated Virtual Machine

An Ansible script to provision a `Vagrantfile` could be provided, so a dedicated Vagrant VM (Virtual Machine) could be bootstrapped for local development. I can submit a PR if needed.

tappoz

No examples on integrating pandas

These tutorials could be improved with some tips/examples/workarounds on how to integrate python code from data scientists that are used to libraries like pandas and numpy. There is no example...

tappoz

The script WordCount.py in the rdd directory has a couple of issues

I am running these scripts on Debian Stretch. # Path issues I had to `import os`, then add this: ```python projRootPath = os.path.dirname(os.path.realpath(__file__)) + "/../" lines = sc.textFile(projRootPath + "in/word_count.text")...

tappoz