cctbx_project icon indicating copy to clipboard operation
cctbx_project copied to clipboard

Conda issues

Open ndevenish opened this issue 5 years ago • 7 comments

Mentioned in https://github.com/cctbx/cctbx_project/commit/6a8503a98b2e251be0b31806a0e5e20751b10055#commitcomment-32583542 that I'd had some problems using conda. Decided to post in a separate issue.

I also had a quick poke over the weekend and noticed a few more issues:

  • Conda doesn't stop installing if you hit ctrl-c, in many cases it seems to continue after bootstrap has posted a traceback? This probably leaves the install in an unknown state.
  • Passing --python3 still installs python 2.7
  • Passing --wxpython4 still installs wxPython3 on linux (but 4.0 is default on OSX??!?)
  • HDF5 and plugins are installed on a base non-dials builder - this probably isn't what you want?

e.g. switching to conda still has lots of regressions vs bootstrap.

The traceback I had earlier was a fixable issue, but that left the folder in a broken state - in any case the tracebacks on failure are probably unacceptable for display to users - bootstrap should handle the install failing.

Using:

FROM ubuntu:latest
RUN apt-get update && apt-get install -y git build-essential python

WORKDIR /opt/
RUN git clone https://github.com/cctbx/cctbx_project.git modules/cctbx_project
RUN cp modules/cctbx_project/libtbx/auto_build/bootstrap.py .

# This command fails
RUN python bootstrap.py base --use-conda

Initial install fails badly because of missing gl drivers:

ERROR conda.core.link:_execute(507): An error occurred while installing package 'conda-forge::pyopengl-3.1.0-py27_0'.
LinkError: post-link script failed for package conda-forge::pyopengl-3.1.0-py27_0

<more unhelpful conda stuff>

Traceback (most recent call last):
  File "modules/cctbx_project/libtbx/auto_build/install_conda.py", line 717, in <module>
Location of conda installation not provided
Proceeding with a fresh installation
Downloading https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Downloaded file to /opt/Miniconda3-latest-Linux-x86_64.sh
Installing miniconda to /opt/miniconda3
Base conda installation:
  /opt/miniconda3
Installing cctbx environment with:
  /opt/modules/cctbx_project/libtbx/auto_build/conda_envs/cctbx_py27_linux-64.txt
    run()
  File "modules/cctbx_project/libtbx/auto_build/install_conda.py", line 713, in run
    copy=namespace.copy, offline=namespace.offline)
  File "modules/cctbx_project/libtbx/auto_build/install_conda.py", line 584, in create_environment
    output = check_output(command_list, env=self.env)
  File "/opt/modules/cctbx_project/libtbx/auto_build/installer_utils.py", line 61, in check_output
    raise RuntimeError("Call to '%s' failed with exit code %d" % (popenargs, retcode))
RuntimeError: Call to '(['/opt/miniconda3/bin/conda', 'create', '--prefix', '/opt/conda_base', '--file', '/opt/modules/cctbx_project/libtbx/auto_build/conda_envs/cctbx_py27_linux-64.txt'],)' failed with exit code 1
Performing actions: base
Installing base packages using:
  python modules/cctbx_project/libtbx/auto_build/install_conda.py --builder=cctbx --install_conda

  removing .pyc files in /opt/modules, walk? True
  removed 0 files
===== Running in .: base
Process failed with return code 1

Fixing with RUN apt-get install -y libgl1-mesa-dev and trying to continue, both of these commands fail e.g. the install is corrupted and only manually fixable:

# python bootstrap.py base --use-conda
    Performing actions: base
    Installing base packages using:
    python modules/cctbx_project/libtbx/auto_build/install_conda.py --builder=cctbx

    removing .pyc files in /opt/modules, walk? True
    removed 3 files
    ===== Running in .: base
    Using default conda installation
    Base conda installation:
    /opt/miniconda3
    Installing cctbx environment with:
    /opt/modules/cctbx_project/libtbx/auto_build/conda_envs/cctbx_py27_linux-64.txt

    CondaValueError: prefix already exists: /opt/conda_base

    Traceback (most recent call last):
    File "modules/cctbx_project/libtbx/auto_build/install_conda.py", line 717, in <module>
        run()
    File "modules/cctbx_project/libtbx/auto_build/install_conda.py", line 713, in run
        copy=namespace.copy, offline=namespace.offline)
    File "modules/cctbx_project/libtbx/auto_build/install_conda.py", line 584, in create_environment
        output = check_output(command_list, env=self.env)
    File "/opt/modules/cctbx_project/libtbx/auto_build/installer_utils.py", line 61, in check_output
        raise RuntimeError("Call to '%s' failed with exit code %d" % (popenargs, retcode))
    RuntimeError: Call to '(['/opt/miniconda3/bin/conda', 'create', '--prefix', '/opt/conda_base', '--file', '/opt/modules/cctbx_project/libtbx/auto_build/conda_envs/cctbx_py27_linux-64.txt'],)' failed with exit code 1
    Process failed with return code 1

# python bootstrap.py base --use-conda=conda_base
    Performing actions: base

    removing .pyc files in /opt/modules, walk? True
    removed 1 files

    Bootstrap success: base

The second counts as failure because it's not a correct installation. It was at this point that I gave up what was supposed to be a quick evaluation.

ndevenish avatar Mar 11 '19 11:03 ndevenish

Hi, the first three points are good. We want CTRL-C to work, and we need --python3 and --wxpython4 for development reasons.

Regarding HDF5 and plugins are installed on a base non-dials builder, that is true, but it is also true for install_base_packages as well. HDF5 and its plugins are always built. See:

https://github.com/cctbx/cctbx_project/blob/master/libtbx/auto_build/install_base_packages.py#L296

We want HDF5 and its plugins for the cctbx builder, but not the cctbx-lite builder. The goal for conda is to have separate manifests for each builder that are generated from the conda metapackages. That way we can split up dependencies by builder.

Here's how this will work. Metapackages have a kind of inheritance. See: https://github.com/phenix-project/phenix_dependencies/blob/master/meta.yaml#L5-L8 Here, the phenix_dependencies conda metapackage, which will be installable with conda install phenix_dependencies, also includes cctbx_dependencies. This allows us to build up lists of dependencies, and conda will resolve duplication and conflicts. The phenix_dependencies metapackage will inherit from dials_dependencies, phaser_dependencies, etc., once they exist.

Now, once the metapackages are in place, manifests for each builder can be created and put into their respective program directories. Some of this scaffolding is in place: https://github.com/cctbx/cctbx_project/blob/master/libtbx/auto_build/install_conda.py#L144-L160

Therefore cctbx-lite, a putative metapackage with an associated manifest, which doesn't configure dxtbx, would not have hdf5.

@bkpoon is working on documentation for all this :)

phyy-nx avatar Mar 13 '19 16:03 phyy-nx

Regarding the two failures you listed. The first failure is because of a missing system dependencies, the gl drivers, which you rightly fixed. The second set of failures is because you didn't clean the folder after a failure. Delete conda_base and try again, just as of you were using install base packages, where you'd also need to delete base and try again.

@bkpoon mentioned he'll be adding functionality to update a conda_base folder using the package manifests when you run bootstrap base. That's great as we'll be in a much better place. You'll no longer have to delete base when the dependencies change which will make running bootstrap more reliable.

phyy-nx avatar Mar 13 '19 16:03 phyy-nx

Ah, I thought cctbx deliberately excluded HDF5 because it wasn't used unless you had XFEL activated. It's good that there is some anticipated flexibility required in the dependencies system; I guess this is how separate build and runtime dependencies will be split also? (you mentioned installing compilers in the past, which obviously we don't want to distribute)

I know I had a missing dependency, but the error message doesn't say that, there's no way to work out how to find the problem unless familiar with packaging systems, and I generally consider giving a stacktrace to a user a "problem", especially without any suggestion how to probe the failure further. (the conda error says to run with -v, but obviously that's a different command than the user just ran). Leaving the folder in a broken state that can't be continued with is also a problem that users shouldn't have to deal with.

I know some of this was a problem before - I've lost count of the number of times I've had to help a user resume a broken partial base install - but a complete rewrite should probably aim to be more flexible and easier to use. More opaque error messages should be a target to avoid.

ndevenish avatar Mar 13 '19 17:03 ndevenish

Let me rephrase that; personally I consider a stacktrace to the user to be our fault, and that the user shouldn't be expected to recover from it. I encourage everyone over here to think the same, and as far as I know there is in principle agreement on the point.

This doesn't mean infinite resources on removing them should be spent, but if it's possible to e.g. check the return value of subprocess calls (or catch the RuntimeError if you are deliberately calling a version of runners that throws them) then you should.

ndevenish avatar Mar 13 '19 17:03 ndevenish

I like the idea of dials_dependencies as an intermediate stepping stone towards a true dials conda package.

Minor nitpick: hdf5 is, as far as I am aware, not a dials dependency, but a dxtbx dependency (and a bit of rstbx and cbflib_adaptbx, not sure why or how much of that is real). So once we eventually rewrite dxtbx to be a truly modular system using entry points the dependency would move down to a dxtbx-hdf5 package and with that out of the main dials/dxtbx land altogether. But for now this is fine and seems sensible.

Anthchirp avatar Mar 13 '19 18:03 Anthchirp

--python3 and --wxpython4 still not working.

ndevenish avatar May 08 '19 12:05 ndevenish