easybuild-easyblocks icon indicating copy to clipboard operation
easybuild-easyblocks copied to clipboard

determine dependencies of Python package to allow parallel installation of Python packages installed as extensions

Open boegel opened this issue 3 years ago • 4 comments

The (experimental) support for installing extensions in parallel that was added in EasyBuild v4.5.0 (cfr. https://docs.easybuild.io/en/latest/Installing_extensions_in_parallel.html) currently only works for R extensions.

To extend this to also Python packages, we need a way to determine the dependencies of a given Python package, ideally only relying on the source tarball for that Python package. This should be implemented in the required_deps method of the PythonPackage easyblock that is used to install Python packages as extensions.

We can't easily do this by just checking the package metadata like we do for R extensions (see required_deps in the RPackage easyblock), because we would have to check and parse setup.py, requirements.txt, pyproject.toml, etc.

Other options:

  • build a wheel first, then use pip to check for required dependencies
    • this requires that the build dependencies are already installed, so doesn't seem very helpful...
  • asking pip what the required dependencies are
    • can this be done by only using the source tarball?
    • maybe pip >= 20.3 supports this? (see here for more info)
  • checking how other tools like johhnydep do this

cc @mboisson, @ocaisa

boegel avatar Nov 26 '21 07:11 boegel

I have looked through different options:

  • If I understand the code correctly, johnnydep uses .whl file to create the dep tree, which, according to your option 1, isn't very helpful
  • We could do some basic parsing of the usual files that contain lists of dependencies (setup.py, requirements.txt, ...) just to get some heuristic out of it, although it won't work for all packages, but it still is something.
  • Also found some online database called libraries.io that monitors large number of different open-source packages and their dependencies. We could skim the dependencies from there, but relying on 3rd party online database seems kinda fishy and slightly unreliable to me.
  • Currently looking into importlib library and its uses to check, whether it could be helpful in the process of extracting direct dependencies solely from the source tarball

ItIsI-Orient avatar Nov 28 '22 13:11 ItIsI-Orient

This may be useful ? https://github.com/pypa/pip/issues/11292#issuecomment-1193131221

mboisson avatar Nov 28 '22 15:11 mboisson

The new --report option, coupled with --dry-run might be useful here too: https://pip.pypa.io/en/stable/reference/installation-report/

mboisson avatar Nov 28 '22 15:11 mboisson

An example of output of --report - (with --dry-run)

$ pip install scipy --no-index --report - --dry-run
Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/avx2, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic
Processing /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic/scipy-1.9.3+computecanada-cp310-cp310-linux_x86_64.whl
Processing /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic/numpy-1.23.0+computecanada-cp310-cp310-linux_x86_64.whl
WARNING: --report is currently an experimental option. The output format may change in a future release without prior warning.
{
  "version": "0",
  "pip_version": "22.3.1",
  "install": [
    {
      "download_info": {
        "url": "file:///cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic/scipy-1.9.3%2Bcomputecanada-cp310-cp310-linux_x86_64.whl",
        "archive_info": {
          "hash": "sha256=e4b3570165450eaa41ffbadfe1bd1e49960a95d35fe5ddc2b31d3563579e9942"
        }
      },
      "is_direct": false,
      "requested": true,
      "metadata": {
        "metadata_version": "2.1",
        "name": "scipy",
        "version": "1.9.3+computecanada",
        "summary": "Fundamental algorithms for scientific computing in Python",
        "description_content_type": "text/x-rst",
        "home_page": "https://scipy.org/",
        "maintainer_email": "SciPy Developers <[email protected]>",
        "classifier": [
          "Development Status :: 5 - Production/Stable",
          "Intended Audience :: Science/Research",
          "Intended Audience :: Developers",
          "License :: OSI Approved :: BSD License",
          "Programming Language :: C",
          "Programming Language :: Python",
          "Programming Language :: Python :: 3",
          "Programming Language :: Python :: 3.8",
          "Programming Language :: Python :: 3.9",
          "Programming Language :: Python :: 3.10",
          "Topic :: Software Development :: Libraries",
          "Topic :: Scientific/Engineering",
          "Operating System :: Microsoft :: Windows",
          "Operating System :: POSIX :: Linux",
          "Operating System :: POSIX",
          "Operating System :: Unix",
          "Operating System :: MacOS"
        ],
        "requires_dist": [
          "numpy<1.26.0,>=1.21",
          "pytest; extra == \"test\"",
          "pytest-cov; extra == \"test\"",
          "pytest-xdist; extra == \"test\"",
          "asv; extra == \"test\"",
          "mpmath; extra == \"test\"",
          "gmpy2; extra == \"test\"",
          "threadpoolctl; extra == \"test\"",
          "scikit-umfpack; extra == \"test\"",
          "sphinx!=4.1.0; extra == \"doc\"",
          "pydata-sphinx-theme==0.9.0; extra == \"doc\"",
          "sphinx-panels>=0.5.2; extra == \"doc\"",
          "matplotlib>2; extra == \"doc\"",
          "numpydoc; extra == \"doc\"",
          "sphinx-tabs; extra == \"doc\"",
          "mypy; extra == \"dev\"",
          "typing_extensions; extra == \"dev\"",
          "pycodestyle; extra == \"dev\"",
          "flake8; extra == \"dev\""
        ],
        "requires_python": ">=3.8",
        "project_url": [
          "Homepage, https://scipy.org/",
          "Documentation, https://docs.scipy.org/doc/scipy/",
          "Source, https://github.com/scipy/scipy",
          "Download, https://github.com/scipy/scipy/releases",
          "Tracker, https://github.com/scipy/scipy/issues"
        ],
        "provides_extra": [
          "test",
          "doc",
          "dev"
        ],
        "description": "Copyright (c) 2001-2002 Enthought, Inc. 2003-2022, SciPy Developers.\n        All rights reserved.\n        \n        Redistribution and use in source and binary forms, with or without\n        modification, are permitted provided that the following conditions\n        are met:\n        \n        1. Redistributions of source code must retain the above copyright\n           notice, this list of conditions and the following disclaimer.\n        \n        2. Redistributions in binary form must reproduce the above\n           copyright notice, this list of conditions and the following\n           disclaimer in the documentation and/or other materials provided\n           with the distribution.\n        \n        3. Neither the name of the copyright holder nor the names of its\n           contributors may be used to endorse or promote products derived\n           from this software without specific prior written permission.\n        \n        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS\n        \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT\n        LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR\n        A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT\n        OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\n        SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\n        LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,\n        DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY\n        THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n        (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\n        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE."
      }
    },
    {
      "download_info": {
        "url": "file:///cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo/generic/numpy-1.23.0%2Bcomputecanada-cp310-cp310-linux_x86_64.whl",
        "archive_info": {
          "hash": "sha256=19aaf828a06c2f0bda2240e6dca4b70cb888aad41d83dc4dcfb87e53cca70346"
        }
      },
      "is_direct": false,
      "requested": false,
      "metadata": {
        "metadata_version": "2.1",
        "name": "numpy",
        "version": "1.23.0+computecanada",
        "platform": [
          "Windows",
          "Linux",
          "Solaris",
          "Mac OS-X",
          "Unix"
        ],
        "summary": "NumPy is the fundamental package for array computing with Python.",
        "home_page": "https://www.numpy.org",
        "download_url": "https://pypi.python.org/pypi/numpy",
        "author": "Travis E. Oliphant et al.",
        "maintainer": "NumPy Developers",
        "maintainer_email": "[email protected]",
        "license": "BSD",
        "classifier": [
          "Development Status :: 5 - Production/Stable",
          "Intended Audience :: Science/Research",
          "Intended Audience :: Developers",
          "License :: OSI Approved :: BSD License",
          "Programming Language :: C",
          "Programming Language :: Python",
          "Programming Language :: Python :: 3",
          "Programming Language :: Python :: 3.8",
          "Programming Language :: Python :: 3.9",
          "Programming Language :: Python :: 3.10",
          "Programming Language :: Python :: 3 :: Only",
          "Programming Language :: Python :: Implementation :: CPython",
          "Topic :: Software Development",
          "Topic :: Scientific/Engineering",
          "Typing :: Typed",
          "Operating System :: Microsoft :: Windows",
          "Operating System :: POSIX",
          "Operating System :: Unix",
          "Operating System :: MacOS"
        ],
        "requires_python": ">=3.8",
        "project_url": [
          "Bug Tracker, https://github.com/numpy/numpy/issues",
          "Documentation, https://numpy.org/doc/1.23",
          "Source Code, https://github.com/numpy/numpy"
        ],
        "description": "It provides:\n\n- a powerful N-dimensional array object\n- sophisticated (broadcasting) functions\n- tools for integrating C/C++ and Fortran code\n- useful linear algebra, Fourier transform, and random number capabilities\n- and much more\n\nBesides its obvious scientific uses, NumPy can also be used as an efficient\nmulti-dimensional container of generic data. Arbitrary data-types can be\ndefined. This allows NumPy to seamlessly and speedily integrate with a wide\nvariety of databases.\n\nAll NumPy wheels distributed on PyPI are BSD licensed.\n\nNumPy requires ``pytest`` and ``hypothesis``.  Tests can then be run after\ninstallation with::\n\n    python -c 'import numpy; numpy.test()'\n\n\n\n"
      }
    }
  ],
  "environment": {
    "implementation_name": "cpython",
    "implementation_version": "3.10.2",
    "os_name": "posix",
    "platform_machine": "x86_64",
    "platform_release": "3.10.0-1160.71.1.el7.x86_64",
    "platform_system": "Linux",
    "platform_version": "#1 SMP Tue Jun 28 15:37:28 UTC 2022",
    "python_full_version": "3.10.2",
    "platform_python_implementation": "CPython",
    "python_version": "3.10",
    "sys_platform": "linux"
  }
}
Would install numpy-1.23.0+computecanada scipy-1.9.3+computecanada

mboisson avatar Nov 28 '22 15:11 mboisson