packaging icon indicating copy to clipboard operation
packaging copied to clipboard

PEP 639: Implement License-Expression and License-File

Open ewdurbin opened this issue 1 year ago • 16 comments

~Work in progress, current open questions:~

  • [x] ~Is the dependency on license-expression OK? Should vendoring or a new parser be considered?~
    • ~went the vendoring route per feedback from @abravalheri~
    • migrated to a pure python parser based on pypa/hatchling's implementation
  • [x] ~How much validation to apply to License-File, as-is as far as metadata goes the only clear spec is "Path delimiters MUST be the forward slash character (/), and parent directory indicators (..) MUST NOT be used.". It is unclear how to check for other delimiters, as paths may have no delimiters whatsoever~
    • validations with path lib seem to fit the bill!
  • [x] ~How to handle deprecations/conflicts. License and License-Expression are mutually exclusive, but it doesn't appear that such a conflict has existed before.~
    • Based on the current behavior of the library, this kind of concern appears to be left to the caller. The metadata module appears to be focused on correctly parsing individual fields.

ewdurbin avatar Sep 03 '24 16:09 ewdurbin

IIRC one of the concerns raised in the Discourse thread was regarding the total file size if license_expression would be added as a dependency or fully vendored. Just as example, the scancode-licensedb-index.json file is over 800kB.

I believe that's one of the reasons @ofek went a slightly different route with #799.

cdce8p avatar Sep 04 '24 11:09 cdce8p

See a828bc7, which is a variant of #799 to build/use a minimal compatible json blob from the upstream for use with the vendored license-expression. Also excluding the data files from the vendored package from the resulting sdist/wheel to save space.

Overall resulting size difference for sdist and wheel

file 24.1 before after
sdist 148 kB 276 kB 212 kB
whl 54 kB 180 kB 112 kB

uncompressed:

file 24.1 before after
sdist 2539 kB* 1690 kB 829 kB
whl 176 kB 1281 kB 426 kB

* this appears to be due to the inclusion of tests/.pytest_cache/v/cache/lastfailed and tests/.pytest_cache/v/cache/nodeids files in the sdist, without it it is 608 kB.

ewdurbin avatar Sep 04 '24 13:09 ewdurbin

  1. Can you please save the data as Python so deserialization is not required?

Looked closer and saw that hatchling's implementation is more complete than I initially understood, so I've migrated this PR to use an adapted version of that, and moved storage of the license data back to using the python based representation

  1. I think we shouldn't merge this until we satisfy the parsing needs of backends (i.e. user defined metadata) rather than just consumers of packages like PyPI.

Which needs are those? Can you delineate them? I'm interested in seeing this merged so PyPI's implementation can progress.

  1. Although I'm not a maintainer, I would be quite weary of packaging beginning to vendor dependencies.

See response to 1. :)

ewdurbin avatar Sep 04 '24 16:09 ewdurbin

  1. I think we shouldn't merge this until we satisfy the parsing needs of backends (i.e. user defined metadata) rather than just consumers of packages like PyPI.

Which needs are those? Can you delineate them? I'm interested in seeing this merged so PyPI's implementation can progress.

Now that you have normalize_license_expression which also raises an exception for invalid stuff, every backend would be satisfied. Thanks!

ofek avatar Sep 04 '24 16:09 ofek

Just a tiny nitpick: The commit message contains a typo: "PEP 693" instead of 639 - this will make digging in the git history harder in the future.

befeleme avatar Sep 05 '24 09:09 befeleme

Just a tiny nitpick: The commit message contains a typo: "PEP 693" instead of 639 - this will make digging in the git history harder in the future.

🤦🏼 resolved with a rebase/force-push

ewdurbin avatar Sep 05 '24 10:09 ewdurbin

I think the spirit of the rule is that there are no backlash-delimeters in the path. However, I am not sure if this is enforceable if a filename can contain a backlash.

Yeah, not sure quite how actionable additional validation on delimiters is.

Also, as the path MUST be relative, we could assert it does not start with /.

Good idea, added in d6b47d5.

I also noticed a license file is never checked for existence. Is that intentional? The PEP states it MUST be present (and the file content MUST be UTF-8 encoded text).

I don't think that is the responsibility of this module/package. Responsibility for that will fall on build tools and indexes to validate.

ewdurbin avatar Sep 05 '24 10:09 ewdurbin

I closed https://github.com/pypa/packaging/pull/799.

ofek avatar Sep 05 '24 17:09 ofek

@ewdurbin Is there a particular reason this PR is marked as a draft?

pradyunsg avatar Sep 08 '24 13:09 pradyunsg

@ewdurbin Is there a particular reason this PR is marked as a draft?

Still unclear on "How much validation to apply to License-File, as-is as far as metadata goes the only clear spec is "Path delimiters MUST be the forward slash character (/), and parent directory indicators (..) MUST NOT be used.". It is unclear how to check for other delimiters, as paths may have no delimiters whatsoever"

ewdurbin avatar Sep 08 '24 13:09 ewdurbin

Opened for review, happy to hear any and all feedback :)

ewdurbin avatar Sep 08 '24 13:09 ewdurbin

Quick FYI I'm working through a review of the code.

brettcannon avatar Sep 11 '24 00:09 brettcannon

@ewdurbin I don't know why GitHub is showing my review as pending. Just let me know when you're ready for me to look at this PR again.

brettcannon avatar Sep 13 '24 17:09 brettcannon

Docs added in 701217b, depends on https://github.com/pypa/packaging.python.org/pull/1595

ewdurbin avatar Sep 15 '24 12:09 ewdurbin

re-requesting review from @pradyunsg and @brettcannon.

ewdurbin avatar Sep 15 '24 12:09 ewdurbin

Outstanding un-resolved comments I am not opinionated on:

  • https://github.com/pypa/packaging/pull/828#discussion_r1745269219
  • https://github.com/pypa/packaging/pull/828#discussion_r1752961594
  • ~https://github.com/pypa/packaging/pull/828#discussion_r1766177348~

If some consensus around these could be found, are we good to proceed with a merge?

ewdurbin avatar Sep 19 '24 13:09 ewdurbin

@brettcannon @pradyunsg just checking on what we need to see this across the line.

I'd like to get to work implementing 639 for PyPI and this is currently holding that up.

ewdurbin avatar Oct 02 '24 15:10 ewdurbin

just checking on what we need to see this across the line.

Time 😅 I'll prioritize finishing my in-progress review.

brettcannon avatar Oct 03 '24 21:10 brettcannon

Thanks for your patience with this, @ewdurbin !

@pradyunsg did you want to review this or are you okay to merge? Once we do I will add the new label for issues.

brettcannon avatar Oct 18 '24 19:10 brettcannon

Given that this has approval from at least one maintainer... I put the effort into a draft PR for implementation in warehouse is now up at https://github.com/pypi/warehouse/pull/16949

ewdurbin avatar Oct 22 '24 21:10 ewdurbin

@pradyunsg did you want to review this or are you okay to merge?

Okay to merge.

pradyunsg avatar Oct 23 '24 23:10 pradyunsg

Merged! Thanks, everyone!

Off to create a new label for our latest module!

brettcannon avatar Oct 24 '24 17:10 brettcannon

Of to go rebasing my PRs on this now! ("Now" meaning after teaching today, probably)

henryiii avatar Oct 24 '24 17:10 henryiii

Thanks everyone! Eagerly awaiting a release so that PyPI's implementation can be deployed :)

ewdurbin avatar Oct 24 '24 17:10 ewdurbin

I'd like to get a PR for extra validation in before a release, as it's easier to add validation before a release than afterwords. I should know if it's needed and have it added by tomorrow.

henryiii avatar Oct 24 '24 17:10 henryiii

The validation errors I was looking for are included, so good to go from me!

Question, though: would adding a warning if a License :: classifier is present and metadata 2.4+ is encountered be a good idea? Likewise, if 2.4 is set and a classic license is used. I don't see another similar warning in packaging.metadata, though.

Basically, this: https://github.com/pypa/pyproject-metadata/blob/cb7450073acecefc714cc5de82816799841777b9/pyproject_metadata/init.py#L505-L517

Warning can always be added after a release, unlike errors, so not a blocker at all.

henryiii avatar Oct 24 '24 18:10 henryiii

From the PEP:

If the License-Expression field is present, build tools MAY raise an error if one or more license classifiers is included in a Classifier field, and MUST NOT add such classifiers themselves.

and

If only the License field is present, tools MAY issue a warning informing users it is deprecated and recommending License-Expression instead.

So both of those warnings seem to be inline with the PEP.

ewdurbin avatar Oct 24 '24 18:10 ewdurbin

Yes, that’s where I got the warnings, but they are MAY, I assume we get to pick.

henryiii avatar Oct 24 '24 18:10 henryiii

would adding a warning if a License :: classifier is present and metadata 2.4+ is encountered be a good idea?

It's a hard call for in 'packaging' itself. If we put it in it will definitely come up more, but for a "MAY" it feels like we might be going beyond just being a library.

brettcannon avatar Oct 25 '24 21:10 brettcannon

Please do not emit a warning in this foundational library.

ofek avatar Oct 25 '24 21:10 ofek