packaging icon indicating copy to clipboard operation
packaging copied to clipboard

Add PEP 621 project table parsing and validation module

Open chrisjsewell opened this issue 2 years ago • 19 comments

As prototyped in #646, I believe this repo is a nice place to house a general parsing and validation function for the PEP 621 [project] table in the pyproject.toml.

This returns a list of (build tool independent) validation errors and a TypedDict of the project data.

chrisjsewell avatar Dec 22 '22 23:12 chrisjsewell

I guess the first question is do we think this is a good fit for this project? For that I'm not sure as we have not really done front-end stuff before. We already have enough work just to keep up with the back-end-related, so I'm not sure if we want to take this extra work on if it isn't going to be used by e.g. pip (which I don't think pip does since it will read PKG-INFO for sdists and the [build-system] table is the back-end-specific part of pyproject.toml). Or put another way, does an installer need access to this table?

My next point is I don't think a custom ParseResult object to collect all validation errors is the right thing long-term in the face of https://peps.python.org/pep-0654/ ; I think writing a shim for ExceptionGroup for Python < 3.11 is a better fit. I also don't think a named tuple should be used for VError (named tuples were created for old APIs to transition to a more object-oriented format, like stuff wrapping C code, not new APIs where you have to support indexing and names forever).

brettcannon avatar Dec 23 '22 00:12 brettcannon

To be clear, I still appreciate the work and I see value of this making it up to PyPI somehow, I just don't know if @pradyunsg and I think this project is the best fit.

brettcannon avatar Dec 23 '22 00:12 brettcannon

For that I'm not sure as we have not really done front-end stuff before.

The point of this is for builders to use (flit, etc). This is what I'm using it for. I understand them to be backend?

chrisjsewell avatar Dec 23 '22 00:12 chrisjsewell

I think writing a shim for ExceptionGroup

I'm not sure if this is achieving quite the same purpose. The key thinking was that I want to collect all the issues with validation, rather than fail on the first one, in order to report to the users all the problems at once, rather than them have to rerun many times. I.e. this is the same as how a linter works.

chrisjsewell avatar Dec 23 '22 01:12 chrisjsewell

I also don't think a named tuple should be used

Yeh can change names tuples for dataclasses, I'm not fussed either way

chrisjsewell avatar Dec 23 '22 01:12 chrisjsewell

I just don't know if @pradyunsg and I think this project is the best fit.

It does basically use every aspect of this module, which is why I felt it was a good fit. Otherwise you are just gonna have to have packaging as a dependency, if it goes somewhere else.

But yeh up to you guys, I'll be using this code anyway 😄

chrisjsewell avatar Dec 23 '22 01:12 chrisjsewell

Outside of the specifics of the implementation, I do think it's a good idea to have some sort of validation for the standards-backed pyproject.toml->[project] metadata stuff in packaging; especially since we're established that we wanna go down some of that route through packaging.metadata already. It might make sense as a part of that; when we come around to handling that aspect of things?

pradyunsg avatar Dec 23 '22 02:12 pradyunsg

FWIW, https://validate-pyproject.readthedocs.io/en/latest/index.html exists already.

pradyunsg avatar Dec 23 '22 02:12 pradyunsg

The key thinking was that I want to collect all the issues with validation, rather than fail on the first one, in order to report to the users all the problems at once, rather than them have to rerun many times.

That's exactly what exception groups were designed for.

especially since we're established that we wanna go down some of that route through packaging.metadata already.

Sort of. I mean packaging.metadata is at a different layer than pyproject.toml's [project] table.

I think I'm +0 on the idea, but I personally don't have the capacity to do the review on the PR to get it merged (I'm already having to consider picking up Donald's draft PR for metadata to see it through). If I were to do this myself, I would write a module that did the following:

  • Defined a convert()/parse() function that took a dict and returned a TypedDict that covers everything that's in a spec converted to objects provided by this project (I think the PR already does this, although I don't know if it covers [build-system]); this aligns with the plans for packaging.metadata since TOML implicitly handles the "raw" parsing of the format. I'm not sure if it's best to parse everything or parse each potential table separately, but we should cover it all if we are taking on the responsibility of handling packaging specs around pyproject.toml.
  • Not do any I/O (which I think the PR already does).
  • Use exception groups for the collected failures.

brettcannon avatar Dec 23 '22 20:12 brettcannon

  • Defined a convert()/parse() function that took a dict and returned a TypedDict that covers everything that's in a spec converted to objects provided by this project (I think the PR already does this, although I don't know if it covers [build-system]); this aligns with the plans for packaging.metadata since TOML implicitly handles the "raw" parsing of the format. I'm not sure if it's best to parse everything or parse each potential table separately, but we should cover it all if we are taking on the responsibility of handling packaging specs around pyproject.toml.
  • Not do any I/O (which I think the PR already does).
  • Use exception groups for the collected failures.

The recently adopted pyproject-metadata covers the first two aspects, while the third has already been proposed (crosslink pypa/pyproject-metadata#45), although it doesn't seem likely to be included soon. Should the project be recommended instead of including the functionality in packaging itself?

chrysle avatar Apr 07 '24 14:04 chrysle

Should the project be recommended instead of including the functionality in packaging itself?

Because build-backends have a limitation in terms of dependencies and usually have to bundle them to prevent dependency cycles, I think it would be best if these things are concentrated in packaging (because it is already a dependency on the majority of backends anyway).

abravalheri avatar Apr 08 '24 08:04 abravalheri

Would merging that into packaging be an option, then (without having asked the maintainers)? Seemingly, development is stalled currently due to the lack of privileges. Merging could also simplify that.

chrysle avatar Apr 08 '24 13:04 chrysle

If there is no dependencies and the maintainers think that is a good idea, I think that would be fine.

abravalheri avatar Apr 12 '24 10:04 abravalheri

Sounds cool to me 😄

chrisjsewell avatar Apr 12 '24 11:04 chrisjsewell

Great!

cc @FFY00 @henryiii @frostming, which I think are the maintainers currently. What are your thoughts on that?

chrysle avatar Apr 13 '24 15:04 chrysle

Merging pyproject-metadata into packaging is a good idea for me.

frostming avatar Apr 14 '24 08:04 frostming

I don't think it's ready. There's still quite a bit of work to be done, like using the standard library email module, and better validation. In the near future, though, it might make a good candidate?

henryiii avatar Apr 16 '24 04:04 henryiii

I don't think it's ready. There's still quite a bit of work to be done, like using the standard library email module, and better validation. In the near future, though, it might make a good candidate?

Would a short term step of what @chrisjsewell originally proposed and doing something like packaging.metadata be useful?:

  • A raw TypedDict for what's expected when parsing the TOML
  • An enriched object representation that translates the TypedDict into packaging objects

brettcannon avatar Aug 07 '24 23:08 brettcannon

I think the main issue is with .as_rfc822(), which is extremely useful, but does some (occasionally incorrect) magic for two fields: Metadata and Dynamic:

  • Metadata: Updating a default is not backward compatible.
  • Dynamic: This one is really messy. pyproject-metadata attempts to guess which fields will by dynamic (meaning the wheel might have different values than the SDist), and it can be wrong.

It's also implemented on a custom class instead of using email.message.EMailMessage.

I think there are two options that would unblock this:

  1. Just remove this method for now and all the internals related to it.
  2. Make dynamic and metadata_version keyword arguments, the later being required. Ideally rework this to support any EMailMessage-like class, and have a user create and pass it.

Especially if the later was chosen, I'd probably want a few cleanups that are easier with a completely new interface, like removing reliance on custom features of the EMailMessage-like class.

henryiii avatar Aug 09 '24 20:08 henryiii