packaging.python.org icon indicating copy to clipboard operation
packaging.python.org copied to clipboard

Clarifications needed re packaging flow from the perspective of a build backend

Open zahlman opened this issue 11 months ago • 2 comments

Issue Description

I'm writing a build backend which aims to deliver these key features relevant to the discussion (among others):

  1. Sdists will only contain static metadata;
  2. The code for sdist-building and wheel-building is separable, such that, when an sdist is downloaded and installed automatically (e.g. by Pip), only the wheel-building code is needed as a build dependency;
  3. There is no legacy support - sdists contain a PKG-INFO specifying a core metadata version of 2.2 or higher (most likely 2.4) and a pyproject.toml.

There are several confusing points I've encountered in the description of pyproject.toml and of the core metadata format, and how they are used in source trees (pyproject.toml only), sdists (both) and wheels (core metadata only). My goal here is to verify that I can accomplish my goals while remaining standards compliant.

The main conceptual problem I'm having is that pyproject.toml and core metadata are described as canonical metadata formats, yet a non-legacy sdist is expected to contain both. I have many questions as a result.

First, regarding sdist creation: my understanding is that in this process, the build backend:

  • MUST faithfully represent static metadata (if any) from the source tree's pyproject.toml in the PKG-INFO;
  • MAY compute values for dynamic metadata and include these in the PKG-INFO as well.

The question is, what happens to the version of pyproject.toml that ends up in the sdist?

It seems to me that it cannot in general be an exact copy of the source tree's pyproject.toml, because if I compute dynamic metadata then there is a conflict - the field is marked dynamic in pyproject.toml but provided statically in PKG-INFO.

Am I at liberty to create an entirely new pyproject.toml, as long as it follows the spec? For example, can I remove the [project] table (since in general this table isn't required to be present, and I've already fully "compiled" its information into PKG-INFO)? Can I change the [build-system] table, such that a different build backend will be used to create the wheel? (One implementation idea I had was to incorporate an in-tree, wheel-specific backend into the sdist.) Should [project] at least be edited to reflect the dynamic metadata values that were calculated (e.g. add the computed values as static keys, and remove the corresponding names from project.dynamic)?


Then, regarding wheel building. Regarding core metadata, it says that "Fields defined in the following specification should be considered valid, complete and not subject to change."

Does that imply that the wheel's METADATA MUST be a copy of the sdist's PKG-INFO?

Doesn't that prevent computing metadata values at wheel creation time? (Not applicable to me, but still worth raising the question.)

Doesn't that in turn imply that non-legacy sdists need to have all the dynamic metadata values computed, and they can't be deferred to wheel-building? (I think this is intentional, so that e.g. installers can figure out basic information about the package without building it. But as of 24.3.1, Pip still does the build first anyway, even when PKG-INFO declares the latest metadata version.)

Doesn't that cause a problem for PEP 725 – Specifying external dependencies in pyproject.toml, since they propose to give semantics to Requires-External metadata whereby the wheel's version could differ? (In particular: the wheel-building process could use a tool like cibuildwheel to vendor a compiled shared C library whose source is not included in the sdist; by my reading of the PEP, the intent is that PKG-INFO would describe the library as an external requirement, but METADATA would not.)

Also: when building the wheel, is it required to look at pyproject.toml at all, or to validate it? My understanding is that the only mandatory purpose pyproject.toml actually serves at this point in the process is to tell an installer what build backend to use (and what its statically-known dependencies are); the backend itself is free to use other files for configuration (i.e. the config isn't required to be in [tool], and other tools simply won't be invoked at this point), and the [project] metadata is either redundant with PKG-INFO or erroneous.


Bonus round:

Given that PEP 725 isn't accepted yet, is there any circumstance in which it would make sense for a modern build backend to output Requires-External or Supported-Platform values in core metadata? I can't think of any.

Code of Conduct

  • [X] I am aware that participants in this repository must follow the PSF Code of Conduct.

zahlman avatar Jan 07 '25 23:01 zahlman

FYI, discussions are typically more lively @ https://discuss.python.org/c/packaging.

webknjaz avatar Jan 07 '25 23:01 webknjaz

It seems to me that it cannot in general be an exact copy of the source tree's pyproject.toml, because if I compute dynamic metadata then there is a conflict - the field is marked dynamic in pyproject.toml but provided statically in PKG-INFO.

I'm pretty sure that pyproject.toml is to be kept “as is”. The build backend for building the wheels will rely on it. This usually means including all the files necessary to build wheels (and I usually prefer the entire Git repo work dir) into sdists. sdists are often regarded as a close enough source of truth, almost equivalent to Git checkouts by various parties (downstream redistributors, for example). Said parties would be using sdists not just for building wheels, but also for running the tests and building the docs.

I'd expect a build backend for building from sdist to behave as close to building from Git as possible. setuptools-scm, for example, injects some metadata (I don't remember where) so that when it's executed from sdist, it outputs the same version (since there's no Git to consult with). It also has a mechanism for extracting that metadata from Git archives (requiring some additional configuration from the users).

My understanding is that pyproject.toml is human-writable, while PKG-INFO is machine-writable. With the points above, as a user, I would expect that it remains unchanged.

As for mixing up static+dynamic metadata, I recall @henryiii presenting something during PyPA Packaging Summit about two years ago.

Also, cc @pradyunsg for PEP 725 opinions.

webknjaz avatar Jan 07 '25 23:01 webknjaz

METADATA 2.2 added the Dynamic metadata field. This is not the same as dynamic field in the pyproject.toml, which is quite confusing, but importantly different (pyproject-metdata got this wrong for a while, by the way).

Using the dynamic field in the pyproject.toml simply states that a) this field is missing from the [project] section[^1], and b) something else is supposed to supply this metadata. It says nothing about SDist/wheel.

The Dynamic metadata field (in PKG-INFO) says that the the metadata is not known at SDist creation time. An example of a field that is not allowed here, but is allowed in the pyproject.toml's dynamic is version. The version number can't change between the SDist and the wheel, so it can't be in the Dynamic list, but it's still useful to have it come from some other place (like git tags, or a file, or even just a separate spot in the pyproject.toml). Any other field listed in dynamic could be Dynamic (they are not 1:1, but close) - that's up to the backend. If a backend wants to write license info based on what it links in when building the wheel, then the user has to list it in dynamic AND the backend has to write it into Dynamic.

[^1]: I'd like to allow this to also mean the metadata can be extended, but that's a future PEP.

The pyprojec.toml is not changed when it is placed in the SDist. It is the human readable source, and since SDist -> SDist is a valid build as well, it needs to remain static. You'd also make an invalid pyproject.toml if you wrote a dynamic field in but didn't remove the item from the dynamic list. And there's no reason to fill it in if it's in the PKG-INFO, just pull it from there if you want it statically.

henryiii avatar Apr 04 '25 20:04 henryiii