telemetry-analysis-service icon indicating copy to clipboard operation
telemetry-analysis-service copied to clipboard

Add an option to use Python 3.x on clusters and scheduled jobs

Open robhudson opened this issue 7 years ago • 3 comments

Currently the default Python on the clusters is Python 2.7. I believe a long term goal should be to move to Python 3.x. But we may have to wait for Amazon. I'm opening this up for discussion and issue tracking against.

The latest Amazon EMR base Linux AMI (2017.03) installs Python 2.7 and 3.4. The Amazon EMR docs state:

Python Defaults

Python 3.4 is now installed by default, but Python 2.7 remains the system default. You may configure Python 3.4 as the system default using either a bootstrap action; you can use the configuration API to set PYSPARK_PYTHON export to /usr/bin/python3.4 in the spark-env classification to affect the Python version used by PySpark.

robhudson avatar Jul 05 '17 18:07 robhudson

Should this instead by "Add an option to use Python 3.x"? Migrating all jobs seems like a big ask. Then we can have a gradual transition to 3.x, eventually making it default for new clusters, and finally sunsetting any Python 2.7 use.

fbertsch avatar Jul 05 '17 19:07 fbertsch

Amazon might make it to python 3.x before we do, moztelemetry and mozetl are still using python 2.7. There are no real incentives to moving off, since migrations are a pretty rough process.

Adding python 3.x as a notebook kernel choice in jupyer and zeppelin would be a nice add.

acmiyaguchi avatar Jul 05 '17 19:07 acmiyaguchi

Just noting here that according to this announcement, python 2 isn't supported with the current release version of the python kernel, IPython 6

http://blog.jupyter.org/2017/04/19/release-of-ipython-6-0/

wcbeard avatar Sep 06 '17 21:09 wcbeard