gdtk icon indicating copy to clipboard operation
gdtk copied to clipboard

Create robust python modules

Open Alex-Muirhead opened this issue 2 months ago • 1 comments

Modern Linux & Python distributions do not allow for packages to be globally installed, and modifying the PYTHONPATH variable is no longer recommended. This PR attempts to update the installation of our python tooling.

Python Modules

Two main python modules are currently implemented:

  1. gdtk (now contained in the gdtk-py directory)
  2. nenzf1d

The relevant makefile recipes now create wheels (see PEP 491), where are moved to $INSTALL_DIR/share/python-packages. A user can point their package manager to search this directory, for instance

  1. PIP_FIND_LINKS=$INSTALL_DIR/share/python-packages, or
  2. pip config set global.find-links $INSTALL_DIR/share/python-packages

This allows the user to simply run pip install gdtk in their project.

Python Scripts

The commonly used python scripts (from what I can observe) have been converted to install within virtual environments (.venv directories within the INSTALL_DIR), which allows isolated installs of the dependencies (e.g. numpy, scipy). The requirements for each set of scripts is detailed in a pyproject.toml file (see spec), with a conservative minimum Python version of 3.8.

The existing makefiles now create the virtual environment and install the python "module" to it, and symlink the resulting binaries to $INSTALL_DIR/bin, as per their previous location.

The end result for the user is unchanged.

Examples

A pyproject.toml is also created in the root of examples/, to ensure all relevant tools are installed for testing. The pytest configuration for lmr is moved to a pytest.ini file, to avoid conflicts.

Requirements

In order to work on HPC systems, this is designed to use backwards-compatible tools. With an internet connection, only python3 with pip is required.

Alex-Muirhead avatar Oct 22 '25 05:10 Alex-Muirhead

I've converted this to a draft while we experiment with the lmr python program/s in isolation.

A couple of notes on what I've found while adding these updates

  1. pip will attempt to keep a cache of packages it has found, so creating multiple separate .venvs for separate tooling will not perform unnecessary downloads from PyPI.
  2. All tools can currently be installed into a single .venv without conflicts, as no tools currently have minimum versions for dependencies.

The updated makefile targets now look for an activated virtual environment when installing. This is either defined by the VIRTUAL_ENV for most python package managers, or CONDA_PREFIX for (ana)conda variants. This directory can also be specified by the user when calling make.

This will hopefully cover the use-case on remote HPC machines with low/limited internet connectivity. If the user has installed python packages globally using apt install python3-* or similar, then creating a virtual environment using python3 -m venv <directory> --system-site-packages will allow the loading of these system-wide packages.

If no virtual environment is active / manually specified, then the default action is to create a new virtual environment at $INSTALL_DIR/.venv, and install all tooling here.

Remaining Questions

  • Should the default/fall-back creation of a .venv enable the --system-site-packages option? I believe that the default .venv shouldn't enable this, to mirror the expected behaviour of Python. Virtual Environments are expected to be isolated from the system.

  • After installing, should the user be required to "activate" the virtual environment to run the installed python tools? This might cause some headaches with sub-shells also needing to activate it. Currently, the makefile recipe will symlink the resulting python tool to the $INSTALL_DIR/bin directory. This alternative allows the installed tool to be available on $PATH along with other GDTk tooling, however may cause duplication within the $PATH if the environment is also activated.

Alex-Muirhead avatar Oct 24 '25 06:10 Alex-Muirhead