easybuild-easyblocks icon indicating copy to clipboard operation
easybuild-easyblocks copied to clipboard

Always set `$EBPYTHONPREFIXES` instead of `PYTHONPATH`

Open Flamefire opened this issue 2 years ago • 9 comments

A short time ago I did a major replacement of setting $EBPYTHONPREFIXES instead of PYTHONPATH on the modules on our HPC cluster because the latter does not play nice with virtualenvs: The $PYTHONPATH packages take precedence over the virtualenv making it impossible for users to update/downgrade packages (in their virtualenv by explicitly issuing the requests via e.g. pip).

Hence we should always set $EBPYTHONPREFIXES (in PythonPackage/PythonBundle) or at least provide an option to set this for each site.

Flamefire avatar Feb 10 '23 13:02 Flamefire

i personally think it is a good thing that users cannot upgrade/downgrade packages that are already loaded with modules:

  • for reproducibility
  • for performance

also, if you don't want to use the module provided version, why load the module in the first place?

so i think if this is implemented it should indeed be optional

smoors avatar Mar 03 '23 17:03 smoors

i personally think it is a good thing that users cannot upgrade/downgrade packages that are already loaded with modules:

The point is: They can. It just silently does not work. See below

if you don't want to use the module provided version, why load the module in the first place?

Because the module might contain a dependency which is not the (primary) reason for using that module, and that might need changing.

Example: TensorFlow requires a lot of additional Python packages. E.g. for 2.11 we have tensorboard-plugin-profile==2.11.1

After loading the relevant module(s) the user creates a virtualenv and installs additional packages with pip into that as required. What is already part of the module(s) will be skipped from installation --> All good so far.

Now suppose that they release a version 2.11.2 of the package with a fix we didn't anticipate when creating the easyconfig but is for some reason important for users.

  • If they do pip install tensorboard-plugin-profile nothing will happen as that package is already installed.
  • pip install --upgrade tensorboard-plugin-profile==2.11.2 will download and install that package into the virtualenv where it will be simply ignored due to our use of $PYTHONPATH in the module file. So the user might rightfully think that the new version is used while it is not possibly wasting a lot of compute resources until an issue occurs rendering the results useless.

A similar thing happens, when a user installs some additional python package into the virtualenv which has a dependency on a Python package contained (as a dependency/extension) in a module. But the users python package requires a different version. pip will resolve the version requirements and show that it installs a different version into the virtualenv. But again although it looks like all is good, it is not: The package in the virtualenv will be ignored.

This has actually caused issues for us in the past which led me to "fixing" all module files. So this is not theoretically.

so i think if this is implemented it should indeed be optional

Probably, but the default should be to use $EBPYTHONPREFIXES as this is/was actually an unannounced, breaking change, which I'd call a bug, caused by us switching from the Python2/3 multi-dep approach (which used $EBPYTHONPREFIXES ) to suffixed ECs (which use $PYTHONPATH) and the implication are so subtly wrong that I guess that is the safer approach.

Coming back to

if you don't want to use the module provided version, why load the module in the first place?

Another viewpoint is: We shouldn't restrict users when we don't have to. If a user actively installs a different version of a package there might be reasons for doing so which I think are hard to anticipate in general. I mean it is nothing that can happen implicitly: The user has to actively do that.

Flamefire avatar Mar 04 '23 13:03 Flamefire

i see your point now, thanks for explaining.

this is/was actually an unannounced, breaking change

true, but adopting the multi_deps approach was a breaking change too, so you could argue it was reverted to the original approach :)

smoors avatar Mar 04 '23 16:03 smoors

I think this disagreement also came up in Julia https://github.com/easybuilders/easybuild-easyconfigs/issues/17455 where @lexming mentioned "The issue is that this approach allows user installs to overwrite what is loaded with modules from EasyBuild (not desirable)."

I'm with @Flamefire here, and it was actually the major second reason for us to use $EBPYTHONPREFIXES, as PYTHONPATH overrides everything (including OS python, which may even be 2.7!). We had the issue that a user had modules loaded, then install a virtual env but because PYTHONPATH was set found the virtual env didn't work as intended.

One could actually use EBPYTHONPREFIXES everywhere and still have the (configurable, if we agree to disagree?) effect of overriding user installs.

See this script that is installed (sitecustomize.py)

# sitecustomize.py script installed by EasyBuild,
# to pick up Python packages installed with `--prefix` into folders listed in $%(EBPYTHONPREFIXES)s

import os
import site
import sys

# print debug messages when $%(EBPYTHONPREFIXES)s_DEBUG is defined
debug = os.getenv('%(EBPYTHONPREFIXES)s_DEBUG')

# use prefixes from $EBPYTHONPREFIXES, so they have lower priority than
# virtualenv-installed packages, unlike $PYTHONPATH

ebpythonprefixes = os.getenv('%(EBPYTHONPREFIXES)s')

if ebpythonprefixes:
    postfix = os.path.join('lib', 'python' + '.'.join(map(str, sys.version_info[:2])), 'site-packages')
    if debug:
        print("[%(EBPYTHONPREFIXES)s] postfix subdirectory to consider in installation directories: %%s" %% postfix)

    potential_sys_prefixes = (getattr(sys, attr, None) for attr in ("real_prefix", "base_prefix", "prefix"))
    sys_prefix = next(p for p in potential_sys_prefixes if p)
    base_paths = [p for p in sys.path if p.startswith(sys_prefix)]

    for prefix in ebpythonprefixes.split(os.pathsep):
        if debug:
            print("[%(EBPYTHONPREFIXES)s] prefix: %%s" %% prefix)
        sitedir = os.path.join(prefix, postfix)
        if os.path.isdir(sitedir):
            if debug:
                print("[%(EBPYTHONPREFIXES)s] adding site dir: %%s" %% sitedir)
            site.addsitedir(sitedir)

    # Move base python paths to the end of sys.path so modules can override packages from the core Python module
    sys.path = [p for p in sys.path if p not in base_paths] + base_paths

if you change base_paths = [p for p in sys.path if p.startswith(sys_prefix)] to base_paths = sys.path[:] it'll override everything. Then you'll also need to parse PYTHONPATH to give that the absolute priority though, e.g.:

base_paths = [p for p in sys.path if p.startswith(sys_prefix)]
python_path = [p for p in sys.path if p in os.getenv("PYTHONPATH", "").split(os.pathsep)]
... #... site.addsitedir(...).
sys.path = python_path + [p for p in sys.path if p not in python_path+base_path] + base_paths

which switches then from PYTHONPATH - virtualenv paths - EBPYTHONPREFIXES - system to PYTHONPATH - EBPYTHONPREFIXES - virtualenv paths - system

bartoldeman avatar Mar 16 '23 15:03 bartoldeman

I think we should do this, is this perhaps something we could squeeze into EB5 since it would be a pretty major change?

ocaisa avatar Feb 22 '24 08:02 ocaisa

I think we should do this, is this perhaps something we could squeeze into EB5 since it would be a pretty major change?

This could even be guided by an option. Might be hard to get always correct when custom easyblocks are involved though. Currently we have some code in a hook that replaces PYTHONPATH by EBPYTHONPREFIXES

Flamefire avatar Feb 22 '24 08:02 Flamefire

I'm in favor of this. Given that I used this with multideps and never say any issues, I think it should be safe to default to this.

Might be hard to get always correct when custom easyblocks are involved though.

Even if we don't cover those cases it's still worth a fix for "just" doing this in the PythonBundle and PythonPackage's, as those are most likely the stuff that the user wants to shadow with their venv anyway.

Micket avatar Feb 22 '24 15:02 Micket

I brought this up on the 5.0 meeting today

  • many of us were in favor of this (at least in that meeting)
  • the need for this seems to be on the rise (more venv use, more PythonBundles to potentially need to shadow)
  • this is similar to the behavior that we already have in R and Julia, so making it consistent is also good.
  • has a nice side benefit of totally not messing up your OS python3.6 since python bundles wouldn't be picked up.
  • definitely want to have a framework option for this (maybe --prefer-ebpythonprefix-over-pythonpath, --prefer-ebpythonprefix, --avoid-pythonpath ?).
  • the only thing that makes this a 5.0 feature is whether this gets behavior is default. Opt-in there is no objections to adding this for any release.

I'm not sure how we want to tackle this in practice though. Approach A: Adding the option in framework, and then customizing our easyblocks to prefer EBPYTHONPREFIX + updating our easyconfigs that happen to hardcode PYTHONPATH today. The (rare) corner cases like, e.g. SRA-Toolkit: https://github.com/easybuilders/easybuild-easyconfigs/blob/1485d4b19ecb2e0d72a7ef27c9b64521a391265a/easybuild/easyconfigs/s/SRA-Toolkit/SRA-Toolkit-3.0.10-gompi-2023a.eb#L71-L74 which puts

modextrapaths = {
    'CLASSPATH': 'jar/ngs-java.jar',
    'PYTHONPATH': 'ngs-python',
}

we either just ignore if the use is intentional, or possibly fix so that they do install into the correct sub-directory.

Approach B: Some form of built-in hook that rewrites the PYTHONPATH's into corresponding EBPYTHONPREFIX's. This would still need to at least avoid some exceptions i think, like system level Spark packages does

modextrapaths = {'PYTHONPATH': 'python'}

and of course, we have to avoid it in all cases where the target path doesn't follow the expected site-packages directory structure.

I'm leaning option towards A.

Edit: Corrections, I thought about it some more, and I think it requires a hook to deal with just the easyconfigs that specify their own PYTHONPATH + having it optionally converted to EBPYTHONPREFIX so, option B is the only alternative.

Micket avatar Feb 29 '24 19:02 Micket

I began working on this in https://github.com/easybuilders/easybuild-framework/pull/4496

Micket avatar Apr 03 '24 22:04 Micket