ml
ml copied to clipboard
reticulate python rstudio
Can we get rocker/ml
and rocker/ml-gpu
to work with reticulate
on RStudio?
library(reticulate)
py_config()
python: /usr/local/bin/python libpython: /usr/lib/python3.5/config-3.5m-x86_64-linux-gnu/libpython3.5.so pythonhome: /usr:/usr version: 3.5.3 (default, Sep 27 2018, 17:25:39) [GCC 6.3.0 20170516] numpy: /usr/local/lib/python3.5/dist-packages/numpy numpy_version: 1.16.3
python versions found: /usr/local/bin/python /usr/bin/python /usr/bin/python3
py_install('pandas')
Error: Prerequisites for installing Python packages not available.
Please install the following Python packages before proceeding: pip, virtualenv
@ryangarner Thanks for opening this issue, yeah, it would be nice if py_install
would work out of the box. Note that reticulate
is in fact already installed, as is pip3
, but the python packages are installed system-wide using pip3
directly and not using virtualenv
, which isn't much help for the user trying to install additional packages from R.
@noamross would love your thoughts on how best to go about this. In particular, as you know Debian/Ubuntu use separate namespaces for python
(2.7) and python3
, and I haven't figured out how to get the reticulate
functions from R to use the python3
versions for everything. We have both python versions installed on the image (actually RStudio pulls in both versions now), so while we could do something like symlink ln -s /usr/bin/pip3/ /usr/local/bin/pip
, ln -s /usr/bin/python3/ /usr/local/bin/python
, I'm not sure that's a good idea, and I'm not entirely sure what to do to so that reticulate
will find python3-virtualenv
instead of the python-virtualenv
after installing it.
@choldgraf could probably set me straight on the best way to go about the python virtualenv setup here.
hmm - is the main question "how are environments set up with virtualenv in Python?" - e.g., is this a file paths problem?
Thanks Chris, I guess this is really two questions:
Q1. What's the best way to set up a Python3 environment for Docker images?
As you know, ubuntu/debian distros expect users to explicitly request python3
, calling just python
, pip
all mean Python 2. the default behavior of reticulate
is to look for python
and pip
binaries, i.e. use python 2. Presumably this can be changed in reticulate config (e.g. use_python("/usr/bin/python3")
, but I don't think that updates the paths for pip
installs. Alternately we could go the symlink route. Note the official tensorflow Dockerfiles make this configurable in build args, but also symlink python3
to /usr/local/bin/python
so that it works without the 3
, though I'm not sure why they choose to do so.
Q2. What's the best choice for managing python environments in our context -- pip
, virtualenv
, or conda
? (and how do we get those working in python3 instead of python2 on debian?)
reticulate
is happy to use any of these options. Currently we're just going pure pip
, but then users cannot install additional packages without root. I suspect we should set things up to use virtualenv
, though this raises a series of additional questions: (a) how do you get reticulate
to use python3
when creating a virtualenv
mode? (b) What's the best choice of home path for the virtualenv
(e.g. we would at least like the same python env to be available to root and non-root users), and (c) is virtualenv
the best choice at all here? (e.g. Nick tells me we'd get better tensorflow performance using conda with intel MKL instead).
sorry for the slow response - I'm actually not a super expert on python paths so may not be the best person to ask, but my undersatnding is:
The simplest for generic data science workflows that might not involve Python packages is to us miniconda to handle environments, along with the conda-forge
channel for Anaconda. The other option is to use virtualenv and system python w/ pip...it's much more light-weight, though it can be non-trivial to install certain kinds of packages (e.g. mapping packages that require non-python dependencies like fiona). You might get some inspiration from the base repo2docker
template here: https://github.com/jupyter/repo2docker/blob/master/repo2docker/buildpacks/base.py#L14
I don't believe that you must use pip with root privileges. Couldn't you install using the --user
flag? That'd install to a user directory instead of root.
@yuvipanda might have some ideas for the best path forward here as well!
ps: for the MKL stuff, that might be the case...I've had differing results using MKL vs. BLAS for linear algebra stuff - I think it depends a lot on the specific computation you're running
If you're already using system python, my recommendation is:
- early on in the dockerfile, create a virtualenv (as root) -
python3 -m venv /opt/venv
- Change the ownership of
/opt/venv
to your regular user, so they can install packages into it without extra effort.chown -R rstudio:rstudio /opt/venv
. - Modify PATH to include the 'bin' directory inside the virtualenv. This will make
python
,pip
etc default to using the python inside the virtualenv, and hence python3.ENV PATH=/opt/venv/bin:${PATH}
- Install whatever base packages you want into this virtualenv (as your normal user):
python3 -m pip install --no-cache-dir <packages>
orpython3 -m pip install --no-cache-dir -r requirements.txt
. The--no-cache-dir
helps reduce the size of your docker image. Note that this must be done as your normal user - accidentally doing this as root will cause issues.
This should work for 99% of use cases. The big reason to move away from this is if you want to use a version of python different from what is provided by your system python. If you need to use a newer version of python, my recommendation is to use miniconda to get just python, but still use a virtualenv for everything else.
This is my quick fix Dockerfile to get reticulate
to work properly. Hope this helps!
FROM rocker/ml-gpu
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install curl -y
RUN curl -O https://bootstrap.pypa.io/get-pip.py
RUN python get-pip.py
RUN apt-get install python-virtualenv -y
RUN pip install virtualenv --upgrade
@ryangarner thanks. Yup, doing apt-get install python-virtualenv
will I believe install python2
version, as I've commented above. I think you could condense your version into
RUN apt-get update && apt-get -y install python-virtualenv python-pip
(note that in general you want to have apt-get update
and apt-get install
on the same line in Dockerfiles and avoid upgrade
to play nicely caching).
If you wanted to stick with the python3 versions (Tensorflow plans to deprecate python 2 in the next year anyway) you'd do
RUN apt-get update && apt-get -y install python3-virtualenv python3-pip
but reticulate won't find pip
or virtualenv
then.
I quite like @yuvipanda 's proposed workflow above, so I'll give a stab at that. In particular, it sounds like step 3 will make python
== python3
? Yuvi, is there any risk of that messing up other things that are using python2?
@cboettig I made #21
@cboettig it shouldn't mess anything up, since it's only for things that run with the specific PATH set (so things started by the user in this container). This is also how mybinder.org runs (python refers to python3 there), so I think it's ok!
@ryangarner if you use
reticulate::virtualenv_install("/opt/venv", "pandas")
things should work as expected. you may want to set reticulate::use_virtualenv("/opt/venv")
Not sure what is up with py_install()
since it should basically be calling use_virtualenv
under the hood, but somehow it's error handler is checking and failing to find the virutalenv first. Still investigating...
linking https://github.com/rstudio/reticulate/issues/496 as related.
thanks Yuvi! Digging a bit more this seems to be a problem in the reticulate
source code inside py_install()
, which assumes binaries are in ("/usr/bin", "/usr/local/bin", path.expand("~/.local/bin")) and not PATH. I've opened a separate issue here: https://github.com/rstudio/reticulate/issues/499#issuecomment-491643997