Create robust python modules
Modern Linux & Python distributions do not allow for packages to be globally installed, and modifying the PYTHONPATH variable is no longer recommended. This PR attempts to update the installation of our python tooling.
Python Modules
Two main python modules are currently implemented:
gdtk(now contained in thegdtk-pydirectory)nenzf1d
The relevant makefile recipes now create wheels (see PEP 491), where are moved to $INSTALL_DIR/share/python-packages. A user can point their package manager to search this directory, for instance
PIP_FIND_LINKS=$INSTALL_DIR/share/python-packages, orpip config set global.find-links $INSTALL_DIR/share/python-packages
This allows the user to simply run pip install gdtk in their project.
Python Scripts
The commonly used python scripts (from what I can observe) have been converted to install within virtual environments (.venv directories within the INSTALL_DIR), which allows isolated installs of the dependencies (e.g. numpy, scipy). The requirements for each set of scripts is detailed in a pyproject.toml file (see spec), with a conservative minimum Python version of 3.8.
The existing makefiles now create the virtual environment and install the python "module" to it, and symlink the resulting binaries to $INSTALL_DIR/bin, as per their previous location.
The end result for the user is unchanged.
Examples
A pyproject.toml is also created in the root of examples/, to ensure all relevant tools are installed for testing. The pytest configuration for lmr is moved to a pytest.ini file, to avoid conflicts.
Requirements
In order to work on HPC systems, this is designed to use backwards-compatible tools. With an internet connection, only python3 with pip is required.
I've converted this to a draft while we experiment with the lmr python program/s in isolation.
A couple of notes on what I've found while adding these updates
pipwill attempt to keep a cache of packages it has found, so creating multiple separate.venvs for separate tooling will not perform unnecessary downloads from PyPI.- All tools can currently be installed into a single
.venvwithout conflicts, as no tools currently have minimum versions for dependencies.
The updated makefile targets now look for an activated virtual environment when installing. This is either defined by the VIRTUAL_ENV for most python package managers, or CONDA_PREFIX for (ana)conda variants. This directory can also be specified by the user when calling make.
This will hopefully cover the use-case on remote HPC machines with low/limited internet connectivity. If the user has installed python packages globally using apt install python3-* or similar, then creating a virtual environment using python3 -m venv <directory> --system-site-packages will allow the loading of these system-wide packages.
If no virtual environment is active / manually specified, then the default action is to create a new virtual environment at $INSTALL_DIR/.venv, and install all tooling here.
Remaining Questions
-
Should the default/fall-back creation of a
.venvenable the--system-site-packagesoption? I believe that the default.venvshouldn't enable this, to mirror the expected behaviour of Python. Virtual Environments are expected to be isolated from the system. -
After installing, should the user be required to "activate" the virtual environment to run the installed python tools? This might cause some headaches with sub-shells also needing to activate it. Currently, the
makefilerecipe will symlink the resulting python tool to the$INSTALL_DIR/bindirectory. This alternative allows the installed tool to be available on$PATHalong with other GDTk tooling, however may cause duplication within the$PATHif the environment is also activated.