hatchet icon indicating copy to clipboard operation
hatchet copied to clipboard

RFC: replace setuptools with meson-python for more stable and predictable builds and installations

Open ilumsden opened this issue 1 year ago • 2 comments

We've had quite a few issues with building and installing Hatchet. Although some (e.g., #125) are out of our control, many of the issues we've encountered (e.g., issues with editable installs) stem from our use of setuptools.

In general, setuptools is great for pure Python packages. However, as soon as you start having any non-Python code, setuptools becomes really painful to use and error prone. This is mostly due to the fact that setuptools is built on top of and inspired by the super old disutils package.

These issues will likely only get worse in the future, especially with disutils (finally) being killed in Python 3.12. Since setuptools depends on disutils, this is a big problem. The team behind setuptools has already copied their own vendored version of disutils, and they plan on eventually replacing disutils with their own code. However, the setuptools team is notoriously slow in making changes, so this major change will likely take years. In the meantime, setuptools will be operating on a vendored, dead library that will only receive as much support as the setuptools team can give it while working on setuptools itself. This almost guarantees that the number of odd, difficult to fix bugs in setuptools will increase in the coming years.

Given both our issues with setuptools and its pending future with the death of disutils, I feel like it's a great time to look into alternatives for building Hatchet (and later Thicket). I looked through several options (e.g., flit, hatchling, and poetry), but in the end, I decided to try to follow the trend started by NumPy and SciPy and use meson-python. This package is a PEP 517-compliant build backend that allows Python packages to be built with the Meson build system. Meson is a multi-language build system built in Python and supporting Ninja-based compilation of code. It is essentially a CMake competitor with a more restrictive and opinionated, but easier to use design.

Speaking of Meson being easy to use, in the course of a couple hours, I was able to completely replace setuptools with meson-python in Hatchet and confirm that it is working correctly. For comparison, the initial implementation of our setuptools integration took several weeks to complete.

With all this said, this PR is meant to be a Request for Comment (RFC) on the idea of switching from setuptools to meson-python in Hatchet (and eventually Thicket). Use this PR as a centralized place for all discussion, ideas, thoughts, opinions, etc. about this proposed switch to meson-python and build systems for Hatchet and Thicket in general.

In this rest of this initial PR comment, I'm going to explain how support for meson-python in Hatchet works, how this support impacts developers, and what are the downsides to this approach.

How meson-python works in Hatchet

Support for meson-python can be broken down into three parts:

  1. Providing metadata and dependency info with pyproject.toml
  2. Defining the build and binary distribution creation processes with meson.build
  3. Defining rules for source distribution creation with .gitattributes

Providing metadata and dependency info with pyproject.toml

Like all PEP 517-compliant build systems, the integration of meson-python starts with pyproject.toml. Like setuptools, meson-python must be specified under the build-system table as both a build dependency and the build backend. Currently, that TOML code looks like this:

[build-system]
requires = ["meson-python", "Cython"]
build-backend = "mesonpy"

These three lines tells PEP 517-compliant installers/builders (e.g., pip) that (1) meson-python and Cython must be installed to build Hatchet and (2) the "build backend" that will do all the heavy lifting is "mesonpy" (the backend implemented by meson-python). In other words, these lines enable the use of meson-python.

Unlike setuptools, all metadata and dependency configuration for meson-python is also done in pyproject.toml. Additionally, meson-python supports all possible package metadata that can be tracked by indexes like PyPI (i.e., the index behind pip). Currently, this TOML code looks like:

[project]
name = "llnl-hatchet"
description = "A Python library for analyzing hierarchical performance data"
dynamic = ["version"]
readme = "./README.md"
license = { file="LICENSE" }
classifiers = [
  "Development Status :: 5 - Production/Stable",
  "License :: OSI Approved :: MIT License",
]
authors = [
  { name = "Abhinav Bhatele", email = "[email protected]" },
  { name = "Stephanie Brink", email = "[email protected]" },
  { name = "Todd Gamblin", email = "[email protected]" }
]
maintainers = [
  { name = "Olga Pearce", email = "[email protected]" },
  { name = "Ian Lumsden", email = "[email protected]" },
  { name = "Connor Scully-Allison", email = "[email protected]" },
  { name = "Dewi Yokelson", email = "[email protected]" },
  { name = "Michael McKinsey", email = "[email protected]" }
]
requires-python = ">= 3.7"
dependencies = [
  "pydot",
  "PyYAML",
  "matplotlib",
  "numpy",
  "pandas",
  "textX < 3.0.0; python_version < '3.6'",
  "textX >= 3.0.0; python_version >= '3.6'",
  "multiprocess",
  "caliper-reader",
]

[project.urls]
source_code = "https://github.com/llnl/hatchet"
documentation = "https://llnl-hatchet.readthedocs.io/en/latest/"

Defining the build and binary distribution creation processes with meson.build

After creating and populating pyproject.toml, all that's left to integrating meson-python is setting up Meson like you would any other project. For reference, Meson's documentation is extremely detailed and useful for this process.

Like CMake, Meson expects there to be a special file in each source directory. In Meson, those special files are called meson.build. Hatchet currently provides a meson.build file for all source directories containing the main Python or Cython source code. The only directories not containing meson.build are:

  • hatchet/external/roundtrip
  • Subdirectories of hatchet/vis
  • hatchet/tests

For hatchet/external/roundtrip and subdirectories of hatchet/vis, I don't include meson.build because these directories will be installed in their entirety into Python sdist or wheels. As a result, I can just install the directories themselves instead of the individual files. For hatchet/tests, I don't include meson.build because we don't want the tests to be installed into Python sdists and wheels.

Each meson.build file contains the relevant Meson code to install and/or build the source files in its directory. Each meson.build file then invokes subdir as needed to navigate into subdirectories. These meson.build files can be grouped into 4 categories.

The first category consists of the top level meson.build file. This file (shown below) configures the Meson project, locates Python and sets up the relevant Meson objects, and confirms that the version in version.py matches the Meson project version.

# Setup the Meson Project
project('llnl-hatchet', 'cython',
    version: '2024.1.1'
)

# Get the Meson Python object, a dependency object to Python for extension modules,
# and the path to the top-level install directory
py = import('python').find_installation(pure: false)
py_dep = py.dependency()
py_top_install_dir = py.get_install_dir()

# Verify that the version from version.py matches the Meson project version
version_run = run_command(py.path(), meson.current_source_dir() / 'hatchet' / 'util' / 'print_version.py')
if version_run.returncode() != 0
    error('The __version__ variable in Hatchet cannot be determined')
endif
version_from_py = version_run.stdout().strip()
if not version_from_py.version_compare(meson.project_version())
    error('The __version__ variable in Hatchet does not match the Meson project version')
endif

# Enter the 'hatchet' subdirectory
subdir('hatchet')

The second category of meson.build files consists of files in directories containing only pure Python code (and subdirectories). These files simply install a list of .py files, as shown below. Note that, in Meson, variables are not scoped, so each variable, such as the list of .py files, needs to have a different name.

# Specify the pure Python files for this directory
hatchet_query_python_sources = [
    '__init__.py',
    'compat.py',
    'compound.py',
    'engine.py',
    'errors.py',
    'object_dialect.py',
    'query.py',
    'string_dialect.py'
]

# Install the specified pure Python files into
# <INSTALL_PREFIX>/hatchet/query
py.install_sources(
    hatchet_query_python_sources,
    pure: false,
    subdir: 'hatchet' / 'query'
)

The third category of meson.build files consists of files in directories containing both pure Python code and other code that needs to be installed alongside the Python code without being built. An example of this is the hatchet/external directory. In this case, the pure Python files are installed as normal (see the second category example), and the non-Python files are installed by calling Meson's install functions with the install_dir argument set. An example is shown below:

# Specify the pure Python files for this directory
hatchet_external_python_sources = [
    '__init__.py',
    'console.py'
]

# Install the specified pure Python files into 
# <INSTALL_PREFIX>/hatchet/external
py.install_sources(
    hatchet_external_python_sources,
    pure: false,
    subdir: 'hatchet' / 'external'
)

# Install roundtrip as-is into
# <INSTALL_PREFIX>/hatchet/external
install_subdir(
    'roundtrip',
    install_dir: py_top_install_dir / 'hatchet' / 'external'
)

The fourth and final category of meson.build files consists of files in directories containing Python extensions that need to be compiled. An example of this is the hatchet/cython_modules directory. Thankfully, Meson makes building and installing this extensions trivial with the Python.extension_module function. An example of this category is shown below:

# Specify the names of the Cython extension modules, excluding file extensions
cython_module_names = [
    'graphframe_modules',
    'reader_modules'
]

# Loop over the Cython modules and build/install them to
# <INSTALL_PREFIX>/hatchet/cython_modules/libs
foreach mod_name : cython_module_names
    py.extension_module(
        mod_name,
        mod_name + '.pyx',
        dependencies: py_dep,
        subdir: 'hatchet' / 'cython_modules' / 'libs',
        install: true
    )
endforeach

Defining rules for source distribution creation with .gitattributes

The only other aspect of this integration of meson-python to keep in mind is the .gitattributes file. To create sdists, meson-python invokes the meson dist command on the repo, which itself uses git archive. Because git archive is used, meson-python will, by default, collect all files committed to version control. Files and directories can be excluded from the sdist by adding them to .gitattributes with the export-ignore attribute.

And that's it! In summary, the integration of meson-python consists of:

  • pyproject.toml for specifying package metadata, dependencies, and configuration
  • Various meson.build files for building software and specifying the rules for making binary distributions (i.e., wheels)
  • .gitattributes to prevent files from being added to source distributions

How does support for meson-python impact developers

Using meson-python does require developers to think a bit more about what is getting distributed to users, but it's not very hard. Essentially, developers just need to ask themselves the following questions:

  1. Should my code be distributed to users at all?
    • If no (e.g., for anything under the tests directory), check .gitattributes, and make sure your code falls under one of the entries in that file. If it doesn't, add an entry
    • If yes, move on to question 2
  2. Does my code need to be compiled?
    • If yes, add a call to py.extension_module (or any other relevant Meson code to compile your code) in the meson.build for your code's directory (see category 4 above)
    • If no, move on to question 3
  3. Is my code Python or some other type of non-compiled source (e.g., JavaScript)?
    • If Python code, add files to the list in meson.build that gets passed to py.install_sources
    • If non-compiled source, add a call to one of Meson's install functions (e.g., install_subdir for directory, install_data for single files) and set the install_dir argument appropriately

Downsides to using meson-python

No tool can only provide benefits, so it's useful to understand what the downsides of using meson-python are. As I see it, there are 3 downsides to using meson-python:

  1. The use of meson-python introduces more build dependencies to Hatchet. As a result, we will be more dependent on package managers doing the right thing in terms of installing build dependencies. In 90+% of cases, it's fair to assume that package managers will properly install build dependencies. However, there are some cases (e.g., pip when provided the --no-build-isolation flag) where that doesn't happen. These corner cases will be more problematic with meson-python than with setuptools (albeit not by much).
  2. The use of meson-python adds a little bit of complexity to the development process. As described above, this complexity is minimal, but it is still there.
  3. The use of meson-python makes editable installs a little weird. As explained in the meson-python docs, editable installs with meson-python will actually recompile extensions (e.g., Cython code) on the fly when changed. To do this, meson-python requires that all build dependencies exist at both build time and run time. As a result, editable installs must be built without build isolation, and the build dependencies must be installed by hand. To mitigate this issue, there is a script in Hatchet called install_editable.sh that will do this for you.

Personally, I don't think these issues are that major, and I believe the benefits of using meson-python (i.e., more control over what gets installed and flexibility to easily build and install any extension module we want) outweigh these minor downsides.

ilumsden avatar May 23 '24 21:05 ilumsden

Should install.sh be deleted as well with setup.py?

michaelmckinsey1 avatar May 24 '24 18:05 michaelmckinsey1

Should install.sh be deleted as well with setup.py?

Good catch. Fixed

ilumsden avatar May 24 '24 19:05 ilumsden