python-spark-tutorial icon indicating copy to clipboard operation
python-spark-tutorial copied to clipboard

The script WordCount.py in the rdd directory has a couple of issues

Open tappoz opened this issue 7 years ago • 0 comments

I am running these scripts on Debian Stretch.

Path issues

I had to import os, then add this:

    projRootPath = os.path.dirname(os.path.realpath(__file__)) + "/../"
    lines = sc.textFile(projRootPath + "in/word_count.text")

Otherwise I am unable to run the script from different points in the file system.

Python version

It does not work on python 2. To make it work I had to use this environment variable:

export PYSPARK_PYTHON=python3

This information could be added to a README file.

I can provide a PR in case.

tappoz avatar Jun 09 '18 10:06 tappoz