python-spark-tutorial
python-spark-tutorial copied to clipboard
### Issue: The`ndash` characters in `word_count.txt` cause an error when following the "Run your first Spark Job" tutorial. There are only two occurences of this character here: "`from 1913–74.`" and...
imports from commons.Utils on first two exercises won't work for me without this. Error was: ValueError: Attempted relative import in non-package
An Ansible script to provision a `Vagrantfile` could be provided, so a dedicated Vagrant VM (Virtual Machine) could be bootstrapped for local development. I can submit a PR if needed.
These tutorials could be improved with some tips/examples/workarounds on how to integrate python code from data scientists that are used to libraries like pandas and numpy. There is no example...
I am running these scripts on Debian Stretch. # Path issues I had to `import os`, then add this: ```python projRootPath = os.path.dirname(os.path.realpath(__file__)) + "/../" lines = sc.textFile(projRootPath + "in/word_count.text")...