telemetry-analysis-service
telemetry-analysis-service copied to clipboard
Add an option to use Python 3.x on clusters and scheduled jobs
Currently the default Python on the clusters is Python 2.7. I believe a long term goal should be to move to Python 3.x. But we may have to wait for Amazon. I'm opening this up for discussion and issue tracking against.
The latest Amazon EMR base Linux AMI (2017.03) installs Python 2.7 and 3.4. The Amazon EMR docs state:
Python Defaults
Python 3.4 is now installed by default, but Python 2.7 remains the system default. You may configure Python 3.4 as the system default using either a bootstrap action; you can use the configuration API to set PYSPARK_PYTHON export to /usr/bin/python3.4 in the spark-env classification to affect the Python version used by PySpark.
Should this instead by "Add an option to use Python 3.x"? Migrating all jobs seems like a big ask. Then we can have a gradual transition to 3.x, eventually making it default for new clusters, and finally sunsetting any Python 2.7 use.
Amazon might make it to python 3.x before we do, moztelemetry
and mozetl
are still using python 2.7. There are no real incentives to moving off, since migrations are a pretty rough process.
Adding python 3.x as a notebook kernel choice in jupyer and zeppelin would be a nice add.
Just noting here that according to this announcement, python 2 isn't supported with the current release version of the python kernel, IPython 6
http://blog.jupyter.org/2017/04/19/release-of-ipython-6-0/