easybuild-easyblocks
easybuild-easyblocks copied to clipboard
determine dependencies of Python package to allow parallel installation of Python packages installed as extensions
The (experimental) support for installing extensions in parallel that was added in EasyBuild v4.5.0 (cfr. https://docs.easybuild.io/en/latest/Installing_extensions_in_parallel.html) currently only works for R
extensions.
To extend this to also Python packages, we need a way to determine the dependencies of a given Python package, ideally only relying on the source tarball for that Python package. This should be implemented in the required_deps
method of the PythonPackage
easyblock that is used to install Python packages as extensions.
We can't easily do this by just checking the package metadata like we do for R extensions (see required_deps
in the RPackage
easyblock), because we would have to check and parse setup.py
, requirements.txt
, pyproject.toml
, etc.
Other options:
- build a wheel first, then use
pip
to check for required dependencies- this requires that the build dependencies are already installed, so doesn't seem very helpful...
- asking
pip
what the required dependencies are- can this be done by only using the source tarball?
- maybe
pip
>= 20.3 supports this? (see here for more info)
- checking how other tools like
johhnydep
do this
cc @mboisson, @ocaisa
I have looked through different options:
- If I understand the code correctly,
johnnydep
uses.whl
file to create the dep tree, which, according to your option 1, isn't very helpful - We could do some basic parsing of the usual files that contain lists of dependencies (
setup.py
,requirements.txt
, ...) just to get some heuristic out of it, although it won't work for all packages, but it still is something. - Also found some online database called libraries.io that monitors large number of different open-source packages and their dependencies. We could skim the dependencies from there, but relying on 3rd party online database seems kinda fishy and slightly unreliable to me.
- Currently looking into
importlib
library and its uses to check, whether it could be helpful in the process of extracting direct dependencies solely from the source tarball
This may be useful ? https://github.com/pypa/pip/issues/11292#issuecomment-1193131221
The new --report
option, coupled with --dry-run
might be useful here too: https://pip.pypa.io/en/stable/reference/installation-report/
An example of output of --report -
(with --dry-run
)
$ pip install scipy --no-index --report - --dry-run
Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/avx2, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic
Processing /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic/scipy-1.9.3+computecanada-cp310-cp310-linux_x86_64.whl
Processing /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic/numpy-1.23.0+computecanada-cp310-cp310-linux_x86_64.whl
WARNING: --report is currently an experimental option. The output format may change in a future release without prior warning.
{
"version": "0",
"pip_version": "22.3.1",
"install": [
{
"download_info": {
"url": "file:///cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic/scipy-1.9.3%2Bcomputecanada-cp310-cp310-linux_x86_64.whl",
"archive_info": {
"hash": "sha256=e4b3570165450eaa41ffbadfe1bd1e49960a95d35fe5ddc2b31d3563579e9942"
}
},
"is_direct": false,
"requested": true,
"metadata": {
"metadata_version": "2.1",
"name": "scipy",
"version": "1.9.3+computecanada",
"summary": "Fundamental algorithms for scientific computing in Python",
"description_content_type": "text/x-rst",
"home_page": "https://scipy.org/",
"maintainer_email": "SciPy Developers <[email protected]>",
"classifier": [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Science/Research",
"Intended Audience :: Developers",
"License :: OSI Approved :: BSD License",
"Programming Language :: C",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Topic :: Software Development :: Libraries",
"Topic :: Scientific/Engineering",
"Operating System :: Microsoft :: Windows",
"Operating System :: POSIX :: Linux",
"Operating System :: POSIX",
"Operating System :: Unix",
"Operating System :: MacOS"
],
"requires_dist": [
"numpy<1.26.0,>=1.21",
"pytest; extra == \"test\"",
"pytest-cov; extra == \"test\"",
"pytest-xdist; extra == \"test\"",
"asv; extra == \"test\"",
"mpmath; extra == \"test\"",
"gmpy2; extra == \"test\"",
"threadpoolctl; extra == \"test\"",
"scikit-umfpack; extra == \"test\"",
"sphinx!=4.1.0; extra == \"doc\"",
"pydata-sphinx-theme==0.9.0; extra == \"doc\"",
"sphinx-panels>=0.5.2; extra == \"doc\"",
"matplotlib>2; extra == \"doc\"",
"numpydoc; extra == \"doc\"",
"sphinx-tabs; extra == \"doc\"",
"mypy; extra == \"dev\"",
"typing_extensions; extra == \"dev\"",
"pycodestyle; extra == \"dev\"",
"flake8; extra == \"dev\""
],
"requires_python": ">=3.8",
"project_url": [
"Homepage, https://scipy.org/",
"Documentation, https://docs.scipy.org/doc/scipy/",
"Source, https://github.com/scipy/scipy",
"Download, https://github.com/scipy/scipy/releases",
"Tracker, https://github.com/scipy/scipy/issues"
],
"provides_extra": [
"test",
"doc",
"dev"
],
"description": "Copyright (c) 2001-2002 Enthought, Inc. 2003-2022, SciPy Developers.\n All rights reserved.\n \n Redistribution and use in source and binary forms, with or without\n modification, are permitted provided that the following conditions\n are met:\n \n 1. Redistributions of source code must retain the above copyright\n notice, this list of conditions and the following disclaimer.\n \n 2. Redistributions in binary form must reproduce the above\n copyright notice, this list of conditions and the following\n disclaimer in the documentation and/or other materials provided\n with the distribution.\n \n 3. Neither the name of the copyright holder nor the names of its\n contributors may be used to endorse or promote products derived\n from this software without specific prior written permission.\n \n THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS\n \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT\n LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR\n A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT\n OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\n SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\n LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,\n DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY\n THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE."
}
},
{
"download_info": {
"url": "file:///cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic/numpy-1.23.0%2Bcomputecanada-cp310-cp310-linux_x86_64.whl",
"archive_info": {
"hash": "sha256=19aaf828a06c2f0bda2240e6dca4b70cb888aad41d83dc4dcfb87e53cca70346"
}
},
"is_direct": false,
"requested": false,
"metadata": {
"metadata_version": "2.1",
"name": "numpy",
"version": "1.23.0+computecanada",
"platform": [
"Windows",
"Linux",
"Solaris",
"Mac OS-X",
"Unix"
],
"summary": "NumPy is the fundamental package for array computing with Python.",
"home_page": "https://www.numpy.org",
"download_url": "https://pypi.python.org/pypi/numpy",
"author": "Travis E. Oliphant et al.",
"maintainer": "NumPy Developers",
"maintainer_email": "[email protected]",
"license": "BSD",
"classifier": [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Science/Research",
"Intended Audience :: Developers",
"License :: OSI Approved :: BSD License",
"Programming Language :: C",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: Implementation :: CPython",
"Topic :: Software Development",
"Topic :: Scientific/Engineering",
"Typing :: Typed",
"Operating System :: Microsoft :: Windows",
"Operating System :: POSIX",
"Operating System :: Unix",
"Operating System :: MacOS"
],
"requires_python": ">=3.8",
"project_url": [
"Bug Tracker, https://github.com/numpy/numpy/issues",
"Documentation, https://numpy.org/doc/1.23",
"Source Code, https://github.com/numpy/numpy"
],
"description": "It provides:\n\n- a powerful N-dimensional array object\n- sophisticated (broadcasting) functions\n- tools for integrating C/C++ and Fortran code\n- useful linear algebra, Fourier transform, and random number capabilities\n- and much more\n\nBesides its obvious scientific uses, NumPy can also be used as an efficient\nmulti-dimensional container of generic data. Arbitrary data-types can be\ndefined. This allows NumPy to seamlessly and speedily integrate with a wide\nvariety of databases.\n\nAll NumPy wheels distributed on PyPI are BSD licensed.\n\nNumPy requires ``pytest`` and ``hypothesis``. Tests can then be run after\ninstallation with::\n\n python -c 'import numpy; numpy.test()'\n\n\n\n"
}
}
],
"environment": {
"implementation_name": "cpython",
"implementation_version": "3.10.2",
"os_name": "posix",
"platform_machine": "x86_64",
"platform_release": "3.10.0-1160.71.1.el7.x86_64",
"platform_system": "Linux",
"platform_version": "#1 SMP Tue Jun 28 15:37:28 UTC 2022",
"python_full_version": "3.10.2",
"platform_python_implementation": "CPython",
"python_version": "3.10",
"sys_platform": "linux"
}
}
Would install numpy-1.23.0+computecanada scipy-1.9.3+computecanada