RFC: replace setuptools with meson-python for more stable and predictable builds and installations
We've had quite a few issues with building and installing Hatchet. Although some (e.g., #125) are out of our control, many of the issues we've encountered (e.g., issues with editable installs) stem from our use of setuptools.
In general, setuptools is great for pure Python packages. However, as soon as you start having any non-Python code, setuptools becomes really painful to use and error prone. This is mostly due to the fact that setuptools is built on top of and inspired by the super old disutils package.
These issues will likely only get worse in the future, especially with disutils (finally) being killed in Python 3.12. Since setuptools depends on disutils, this is a big problem. The team behind setuptools has already copied their own vendored version of disutils, and they plan on eventually replacing disutils with their own code. However, the setuptools team is notoriously slow in making changes, so this major change will likely take years. In the meantime, setuptools will be operating on a vendored, dead library that will only receive as much support as the setuptools team can give it while working on setuptools itself. This almost guarantees that the number of odd, difficult to fix bugs in setuptools will increase in the coming years.
Given both our issues with setuptools and its pending future with the death of disutils, I feel like it's a great time to look into alternatives for building Hatchet (and later Thicket). I looked through several options (e.g., flit, hatchling, and poetry), but in the end, I decided to try to follow the trend started by NumPy and SciPy and use meson-python. This package is a PEP 517-compliant build backend that allows Python packages to be built with the Meson build system. Meson is a multi-language build system built in Python and supporting Ninja-based compilation of code. It is essentially a CMake competitor with a more restrictive and opinionated, but easier to use design.
Speaking of Meson being easy to use, in the course of a couple hours, I was able to completely replace setuptools with meson-python in Hatchet and confirm that it is working correctly. For comparison, the initial implementation of our setuptools integration took several weeks to complete.
With all this said, this PR is meant to be a Request for Comment (RFC) on the idea of switching from setuptools to meson-python in Hatchet (and eventually Thicket). Use this PR as a centralized place for all discussion, ideas, thoughts, opinions, etc. about this proposed switch to meson-python and build systems for Hatchet and Thicket in general.
In this rest of this initial PR comment, I'm going to explain how support for meson-python in Hatchet works, how this support impacts developers, and what are the downsides to this approach.
How meson-python works in Hatchet
Support for meson-python can be broken down into three parts:
- Providing metadata and dependency info with
pyproject.toml - Defining the build and binary distribution creation processes with
meson.build - Defining rules for source distribution creation with
.gitattributes
Providing metadata and dependency info with pyproject.toml
Like all PEP 517-compliant build systems, the integration of meson-python starts with pyproject.toml. Like setuptools, meson-python must be specified under the build-system table as both a build dependency and the build backend. Currently, that TOML code looks like this:
[build-system]
requires = ["meson-python", "Cython"]
build-backend = "mesonpy"
These three lines tells PEP 517-compliant installers/builders (e.g., pip) that (1) meson-python and Cython must be installed to build Hatchet and (2) the "build backend" that will do all the heavy lifting is "mesonpy" (the backend implemented by meson-python). In other words, these lines enable the use of meson-python.
Unlike setuptools, all metadata and dependency configuration for meson-python is also done in pyproject.toml. Additionally, meson-python supports all possible package metadata that can be tracked by indexes like PyPI (i.e., the index behind pip). Currently, this TOML code looks like:
[project]
name = "llnl-hatchet"
description = "A Python library for analyzing hierarchical performance data"
dynamic = ["version"]
readme = "./README.md"
license = { file="LICENSE" }
classifiers = [
"Development Status :: 5 - Production/Stable",
"License :: OSI Approved :: MIT License",
]
authors = [
{ name = "Abhinav Bhatele", email = "[email protected]" },
{ name = "Stephanie Brink", email = "[email protected]" },
{ name = "Todd Gamblin", email = "[email protected]" }
]
maintainers = [
{ name = "Olga Pearce", email = "[email protected]" },
{ name = "Ian Lumsden", email = "[email protected]" },
{ name = "Connor Scully-Allison", email = "[email protected]" },
{ name = "Dewi Yokelson", email = "[email protected]" },
{ name = "Michael McKinsey", email = "[email protected]" }
]
requires-python = ">= 3.7"
dependencies = [
"pydot",
"PyYAML",
"matplotlib",
"numpy",
"pandas",
"textX < 3.0.0; python_version < '3.6'",
"textX >= 3.0.0; python_version >= '3.6'",
"multiprocess",
"caliper-reader",
]
[project.urls]
source_code = "https://github.com/llnl/hatchet"
documentation = "https://llnl-hatchet.readthedocs.io/en/latest/"
Defining the build and binary distribution creation processes with meson.build
After creating and populating pyproject.toml, all that's left to integrating meson-python is setting up Meson like you would any other project. For reference, Meson's documentation is extremely detailed and useful for this process.
Like CMake, Meson expects there to be a special file in each source directory. In Meson, those special files are called meson.build. Hatchet currently provides a meson.build file for all source directories containing the main Python or Cython source code. The only directories not containing meson.build are:
-
hatchet/external/roundtrip - Subdirectories of
hatchet/vis -
hatchet/tests
For hatchet/external/roundtrip and subdirectories of hatchet/vis, I don't include meson.build because these directories will be installed in their entirety into Python sdist or wheels. As a result, I can just install the directories themselves instead of the individual files. For hatchet/tests, I don't include meson.build because we don't want the tests to be installed into Python sdists and wheels.
Each meson.build file contains the relevant Meson code to install and/or build the source files in its directory. Each meson.build file then invokes subdir as needed to navigate into subdirectories. These meson.build files can be grouped into 4 categories.
The first category consists of the top level meson.build file. This file (shown below) configures the Meson project, locates Python and sets up the relevant Meson objects, and confirms that the version in version.py matches the Meson project version.
# Setup the Meson Project
project('llnl-hatchet', 'cython',
version: '2024.1.1'
)
# Get the Meson Python object, a dependency object to Python for extension modules,
# and the path to the top-level install directory
py = import('python').find_installation(pure: false)
py_dep = py.dependency()
py_top_install_dir = py.get_install_dir()
# Verify that the version from version.py matches the Meson project version
version_run = run_command(py.path(), meson.current_source_dir() / 'hatchet' / 'util' / 'print_version.py')
if version_run.returncode() != 0
error('The __version__ variable in Hatchet cannot be determined')
endif
version_from_py = version_run.stdout().strip()
if not version_from_py.version_compare(meson.project_version())
error('The __version__ variable in Hatchet does not match the Meson project version')
endif
# Enter the 'hatchet' subdirectory
subdir('hatchet')
The second category of meson.build files consists of files in directories containing only pure Python code (and subdirectories). These files simply install a list of .py files, as shown below. Note that, in Meson, variables are not scoped, so each variable, such as the list of .py files, needs to have a different name.
# Specify the pure Python files for this directory
hatchet_query_python_sources = [
'__init__.py',
'compat.py',
'compound.py',
'engine.py',
'errors.py',
'object_dialect.py',
'query.py',
'string_dialect.py'
]
# Install the specified pure Python files into
# <INSTALL_PREFIX>/hatchet/query
py.install_sources(
hatchet_query_python_sources,
pure: false,
subdir: 'hatchet' / 'query'
)
The third category of meson.build files consists of files in directories containing both pure Python code and other code that needs to be installed alongside the Python code without being built. An example of this is the hatchet/external directory. In this case, the pure Python files are installed as normal (see the second category example), and the non-Python files are installed by calling Meson's install functions with the install_dir argument set. An example is shown below:
# Specify the pure Python files for this directory
hatchet_external_python_sources = [
'__init__.py',
'console.py'
]
# Install the specified pure Python files into
# <INSTALL_PREFIX>/hatchet/external
py.install_sources(
hatchet_external_python_sources,
pure: false,
subdir: 'hatchet' / 'external'
)
# Install roundtrip as-is into
# <INSTALL_PREFIX>/hatchet/external
install_subdir(
'roundtrip',
install_dir: py_top_install_dir / 'hatchet' / 'external'
)
The fourth and final category of meson.build files consists of files in directories containing Python extensions that need to be compiled. An example of this is the hatchet/cython_modules directory. Thankfully, Meson makes building and installing this extensions trivial with the Python.extension_module function. An example of this category is shown below:
# Specify the names of the Cython extension modules, excluding file extensions
cython_module_names = [
'graphframe_modules',
'reader_modules'
]
# Loop over the Cython modules and build/install them to
# <INSTALL_PREFIX>/hatchet/cython_modules/libs
foreach mod_name : cython_module_names
py.extension_module(
mod_name,
mod_name + '.pyx',
dependencies: py_dep,
subdir: 'hatchet' / 'cython_modules' / 'libs',
install: true
)
endforeach
Defining rules for source distribution creation with .gitattributes
The only other aspect of this integration of meson-python to keep in mind is the .gitattributes file. To create sdists, meson-python invokes the meson dist command on the repo, which itself uses git archive. Because git archive is used, meson-python will, by default, collect all files committed to version control. Files and directories can be excluded from the sdist by adding them to .gitattributes with the export-ignore attribute.
And that's it! In summary, the integration of meson-python consists of:
-
pyproject.tomlfor specifying package metadata, dependencies, and configuration - Various
meson.buildfiles for building software and specifying the rules for making binary distributions (i.e., wheels) -
.gitattributesto prevent files from being added to source distributions
How does support for meson-python impact developers
Using meson-python does require developers to think a bit more about what is getting distributed to users, but it's not very hard. Essentially, developers just need to ask themselves the following questions:
- Should my code be distributed to users at all?
- If no (e.g., for anything under the
testsdirectory), check.gitattributes, and make sure your code falls under one of the entries in that file. If it doesn't, add an entry - If yes, move on to question 2
- If no (e.g., for anything under the
- Does my code need to be compiled?
- If yes, add a call to
py.extension_module(or any other relevant Meson code to compile your code) in themeson.buildfor your code's directory (see category 4 above) - If no, move on to question 3
- If yes, add a call to
- Is my code Python or some other type of non-compiled source (e.g., JavaScript)?
- If Python code, add files to the list in
meson.buildthat gets passed topy.install_sources - If non-compiled source, add a call to one of Meson's
installfunctions (e.g.,install_subdirfor directory,install_datafor single files) and set theinstall_dirargument appropriately
- If Python code, add files to the list in
Downsides to using meson-python
No tool can only provide benefits, so it's useful to understand what the downsides of using meson-python are. As I see it, there are 3 downsides to using meson-python:
- The use of meson-python introduces more build dependencies to Hatchet. As a result, we will be more dependent on package managers doing the right thing in terms of installing build dependencies. In 90+% of cases, it's fair to assume that package managers will properly install build dependencies. However, there are some cases (e.g.,
pipwhen provided the--no-build-isolationflag) where that doesn't happen. These corner cases will be more problematic with meson-python than withsetuptools(albeit not by much). - The use of meson-python adds a little bit of complexity to the development process. As described above, this complexity is minimal, but it is still there.
- The use of meson-python makes editable installs a little weird. As explained in the meson-python docs, editable installs with meson-python will actually recompile extensions (e.g., Cython code) on the fly when changed. To do this, meson-python requires that all build dependencies exist at both build time and run time. As a result, editable installs must be built without build isolation, and the build dependencies must be installed by hand. To mitigate this issue, there is a script in Hatchet called
install_editable.shthat will do this for you.
Personally, I don't think these issues are that major, and I believe the benefits of using meson-python (i.e., more control over what gets installed and flexibility to easily build and install any extension module we want) outweigh these minor downsides.
Should install.sh be deleted as well with setup.py?
Should
install.shbe deleted as well withsetup.py?
Good catch. Fixed