physionet-build icon indicating copy to clipboard operation
physionet-build copied to clipboard

Avoid in-place pip upgrade

Open bemoody opened this issue 2 years ago • 8 comments

Currently packages are installed and upgraded in /physionet/python-env/physionet by invoking /physionet/python-env/physionet/bin/pip3 install .... (See deploy/update.)

Some of the many problems with this:

  • packages removed from requirements.txt don't get removed
  • no guarantee in general that upgrading will give reproducible results
  • no reliable way to roll back to a previous version
  • pip is implemented in python and therefore installed packages can screw up pip
  • installation can fail for a million reasons, in which case we'd prefer the django server to continue using the old working libraries rather than the new partially-installed ones

Instead it would be preferable to install packages from scratch every time (the way we do every time we run the test suite.)

Some potential obstacles to doing that:

  • if requirements.txt isn't changed, we don't want to waste time installing stuff every time we deploy to the server
  • if pypi.org is down, we still want to be able to deploy changes (assuming those changes don't change requirements.txt)
  • a virtualenv prefix can never be moved or renamed

A possible solution:

  • Generate a hash of the requirements.txt file, e.g. 45454a9fdba6595291c93265e1b6c2cc.
  • If /physionet/python-env/45454a9fdba6595291c93265e1b6c2cc already exists, then simply create a symlink /physionet/python-env/physionet -> 45454a9fdba6595291c93265e1b6c2cc.
  • Otherwise, create a new virtualenv prefix there and install the packages from requirements.txt, then create the symlink.
  • (Some magical process to automatically delete prefixes that are no longer used.)

bemoody avatar Mar 07 '23 18:03 bemoody

If /physionet/python-env/45454a9fdba6595291c93265e1b6c2cc already exists, then simply create a symlink /physionet/python-env/physionet -> 45454a9fdba6595291c93265e1b6c2cc.

might work but has some odd effects, just as renaming does.

Basically, in this case, scripts installed by virtualenv or pip (such as pip itself, django-admin, or the activate file) will hardcode the original path (/physionet/python-env/45454a9fdba6595291c93265e1b6c2cc).

But if python3 is invoked as /physionet/python-env/physionet/bin/python3, then /physionet/python-env/physionet is what gets stuck into sys.path. I think thhe same is true of the python interpreter invoked by uwsgi (see /etc/uwsgi/vassals/physionet_uwsgi.ini)

Not to say this is necessarily a bad idea, it would be an improvement in some ways, but it also comes with pitfalls and doesn't provide the hermeticity that you might expect.

bemoody avatar Mar 21 '23 19:03 bemoody

Also note that .pyc files have the original path baked into them for some reason and I have no idea why (it isn't reflected in __file__ for example...)

bemoody avatar Mar 21 '23 19:03 bemoody

A way to avoid this inconsistency (and make /physionet/python-env/physionet/bin/python3 hermetic) would be to do this

virtualenv -ppython3 --quiet --clear --no-download "$new_venv_dir"
mv -f "$new_venv_dir/bin/python" "$new_venv_dir/bin/python.real"
echo '#!/bin/sh' > "$new_venv_dir/bin/python"
echo 'exec "$(readlink -f "$0").real" "$@"' >> "$new_venv_dir/bin/python"
chmod a+x "$new_venv_dir/bin/python"

Don't know if it affects uwsgi, that still needs testing.

Realistically even if not fully hermetic this feels safer than upgrading in place. But is it wise to diverge from standard virtualenv behavior?

bemoody avatar Apr 13 '23 15:04 bemoody

echo 'exec "$(readlink -f "$0").real" "$@"' >> "$new_venv_dir/bin/python"

or, y'know, hardcode the path like everybody else

bemoody avatar Apr 13 '23 15:04 bemoody

Part of me thinks that the safest approach is just to reinstall everything! Time isn't really an issue for us, I don't think? If the packages are cached (which I assume they are?), then I imagine installation would be pretty quick anyway.

tompollard avatar Apr 14 '23 14:04 tompollard

Part of me thinks that the safest approach is just to reinstall everything!

Yep. The obvious way to do it would be like this, right?

rm -rf /physionet/python-env/physionet
virtualenv /physionet/python-env/physionet
/physionet/python-env/physionet/bin/pip3 install -r requirements.txt

The main problem with that is that uwsgi can and will restart its worker processes at any time. (This is intentional on the part of uwsgi and helps avoid memory leaks and memory fragmentation; we could reduce the restart frequency but probably don't want to disable it completely.)

(This is also part of the reason that test-upgrade.sh re-runs the old server test suite after installing the new requirements.txt.)

This means that while you're in the middle of running the above commands, uwsgi can come along and launch a new Python interpreter that expects to find its libraries installed in /physionet/python-env/physionet, which would then crash if something is missing.

There are more subtle problems with the above approach, e.g. it might fail if the requirements.txt wasn't tested on a matching Python version, or packages might be added/removed on pypi. Also, some Python libraries don't import all their dependencies immediately, and packages might load files (e.g. templates) from the Python installation path at any time.

So, the issue that I'm trying to resolve is whether it's possible to install the new packages and then instantaneously replace the old installation tree with the new one, without needing to touch uwsgi or its configuration.

Preferably we would also make the installation "hermetic" by which I mean that even after this replacement takes place, old Python processes keep using the old installation tree rather than the new one.

bemoody avatar Apr 26 '23 17:04 bemoody

Makes sense, thanks for explaining! I'm keeping an eye on https://github.com/MIT-LCP/physionet-build/pull/1939 and will review when it's ready for review.

tompollard avatar Apr 26 '23 17:04 tompollard

https://github.com/MIT-LCP/physionet-build/issues/1921#issuecomment-1507188116

Don't know if it affects uwsgi, that still needs testing.

Replacing the python binary does not appear to have any effect on uwsgi. uwsgi still uses the non-hermetic path.

bemoody avatar Oct 25 '23 15:10 bemoody