validate-pyproject icon indicating copy to clipboard operation
validate-pyproject copied to clipboard

Allow package-data for stubs packages

Open cdce8p opened this issue 11 months ago • 7 comments

I'm currently working on moving the stubs_uploader from setup.py to pyproject.toml. It's used to build and upload the typeshed stubs packages automatically. While doing so I noticed that setuptools rejects stub package names like requests-stubs for package-data. I know that .pyi and py.typed files are auto-included. Each stubs package also contains a METADATA.toml file which should be included in the distribution. Do to the nature of the project, it would be desirable to use include-package-data = false and instead specify the wanted files directly.

This PR slightly modifies the schema to allow for that. If accepted, it would be great if a new version could be released so I include these changes in setuptools.

https://github.com/typeshed-internal/stub_uploader/blob/cbdc0b85972e476b4dedbd626cd12dadf7dd60a9/stub_uploader/build_wheel.py#L51-L84

cdce8p avatar Mar 29 '25 21:03 cdce8p

/CC @abravalheri

cdce8p avatar Mar 31 '25 15:03 cdce8p

Sorry for the delay.

I think that this is very reasonable. Can we somehow re-use the existing definition?

https://github.com/abravalheri/validate-pyproject/blob/v0.24.1/src/validate_pyproject/plugins/setuptools.schema.json#L251-L261?

If accepted, it would be great if a new version could be released so I include these changes in setuptools.

Sure, it may take a little bit however, I want to wait for the situation to stabilise.

abravalheri avatar Mar 31 '25 15:03 abravalheri

Sorry for the late reply. I somehow lost track of it.

I think that this is very reasonable. Can we somehow re-use the existing definition?

https://github.com/abravalheri/validate-pyproject/blob/v0.24.1/src/validate_pyproject/plugins/setuptools.schema.json#L251-L261?

Unless I'm missing something, it isn't really possible unfortunately. While the package name itself is normalized by setuptools, I don't think the name in the package-data dict is. Thus the best we can probably do is add a separate validation function for it to reduce duplications. See https://github.com/abravalheri/validate-pyproject/pull/248/commits/80a6d7dac474908dd43f2fd418e666cd8c3663de

cdce8p avatar Apr 23 '25 14:04 cdce8p

@abravalheri Did you had a chance to look at this again?

cdce8p avatar Aug 05 '25 18:08 cdce8p

Hi @cdce8p, sorry for the delay in getting back to this.

Unless I'm missing something, it isn't really possible unfortunately. While the package name itself is normalized by setuptools, I don't think the name in the package-data dict is. Thus the best we can probably do is add a separate validation function for it to reduce duplications.

I am not sure I follow. I tested the following snippet for package-data and exclude-package-data on top of https://github.com/abravalheri/validate-pyproject/pull/248/commits/4b7244937eea1a5bcd1c64377920009a19e14dc0 and the added tests seem to pass as expected[^1]:

        "anyOf": [
          { "$ref": "#/definitions/package-name" },
          { "const": "*" }
        ]

Note that #/definitions/package-name already includes the *-stubs case, so that part is covered.

As for the python-module-name-relaxed format used in the shared definition, I believe it’s consistent with how we validate package and package-dir, so it should be fine. We could technically abstract this further into a new definition in the JSON Schema to reduce repetition even more, but I haven’t found a good name to abstract such a concept, so I think the snippet above is already a reasonable compromise[^2].

Personally, I’d prefer keeping this logic in the JSON Schema itself rather than introducing another format function. If you’re okay with that, I can push my local copy with these changes.

[^1]: The only error I got testing locally on 3.12 was FAILED tests/test_pre_compile.py::test_invalid_examples_api[store/ruff-unknown.toml-cli_pre_compile] - urllib.error.URLError: <urlopen error [Errno 11] Resource temporarily unavailable> which is unrelated to the schema changes. [^2]: Maybe python-module-name-relaxed-or-wildcard (based on your suggested function)... It is a big name, but it does a good job.

abravalheri avatar Aug 06 '25 16:08 abravalheri

I am not sure I follow. I tested the following snippet for package-data and exclude-package-data on top of 4b72449 and the added tests seem to pass as expected:

        "anyOf": [
          { "$ref": "#/definitions/package-name" },
          { "const": "*" }
        ]

Note that #/definitions/package-name already includes the *-stubs case, so that part is covered.

As for the python-module-name-relaxed format used in the shared definition, I believe it’s consistent with how we validate package and package-dir, so it should be fine.

Not quite, the point I'm worried about is that the package name itself is normalized by setuptools, however the package-data module name isn't. Even with python_module_name_relaxed aborting early for stubs packages, this would still allow invalid names for modules inside the package-data dict. E.g.

[project]
name = "some-project"

[tool.setuptools.package-data]
"some-project" = ["*.yaml"]

Files stored in some_project/.... Validation would succeed while the data files are not included.

cdce8p avatar Aug 06 '25 21:08 cdce8p

Any update?

cdce8p avatar Oct 23 '25 10:10 cdce8p

Sorry for the delay,

```toml
[project]
name = "some-project"

[tool.setuptools.package-data]
"some-project" = ["*.yaml"]

Files stored in some_project/.... Validation would succeed while the data files are not included.

So if I understand correctly, the error you are concerned about is when the person writes some-project but the actual directory uses underscore instead of dashes (or vice versa)...

This is a tricky one... but since both variations are allowed to coexist and they are both individually considered valid packages, I don't think we can enforce that distinction. This was a feature request from the setuptools issue tracker (if I am not wrong), previously only _ were allowed.

Now both some-project or some_project are allowed to exist, so they both can have data files inside of them...

abravalheri avatar Nov 27 '25 11:11 abravalheri

So if I understand correctly, the error you are concerned about is when the person writes some-project but the actual directory uses underscore instead of dashes (or vice versa)...

Correct.

This is a tricky one... but since both variations are allowed to coexist and they are both individually considered valid packages, I don't think we can enforce that distinction. This was a feature request from the setuptools issue tracker (if I am not wrong), previously only _ were allowed.

Now both some-project or some_project are allowed to exist, so they both can have data files inside of them...

I think of it differently. On is the project name itself whereas the other references the directory. So I think the distinction does make sense. Unless I'm missing something, there are basically two options now

  1. Leave the PR as is. This enforces that the name in package-data matches the directory exactly (without normalization). This is how setuptools handles it currently. The PR would only allow the added -stubs suffix.
  2. Or we could relax the validation for package-data names. In that case though I believe we should definitely add the normalization to setuptools. Otherwise users might encounter situations where package files aren't included when they should have been, without any errors from setuptools.

Personally, I'd recommend option 1. If there is a desire for package-data normalization, this could be added separately later.

cdce8p avatar Nov 27 '25 13:11 cdce8p

Thanks for the clarification. I think we have some partial alignment, but I would like to refine one point for accuracy (and comment on its consequences):

This enforces that the name in package-data matches the directory exactly (without normalization). This is how setuptools handles it currently.

This statement seem to imply that "setuptools enforces the name to match the directory", which would not be entirely precise. What setuptools actually does is: if the directory name matches the package name in the configuration (exactly, without normalisation), then the files are included. If they don't match, setuptools simply doesn't include them. It doesn't actively enforce the match beyond that behaviour.

It's also worth noting that setuptools allows two distinct package names that differ only by _ vs -. For example, pip-run and pip_run are both valid and treated as separate packages, each capable of having its own package data.

This flexibility was introduced because of a feature request from setuptools: previously, validate-pyproject enforced normalisation to _ (since identifiers in import statements require underscores). But the FR highlighted real-world use cases where dashes are useful/desired: pip-run being the canonical example. Both importlib.import_module and runpy.run_module handle names with dashes correctly, so the request was accepted.

Here's a quick demo showing Python treating them as distinct:

# docker run --rm -it python:3.12-bookworm /bin/bash

mkdir -p /tmp/pkg/pip_run
echo "print('1: this is pip_run')" > /tmp/pkg/pip_run/__main__.py

mkdir -p /tmp/pkg/pip-run
echo "print('2: this is pip-run')" > /tmp/pkg/pip-run/__main__.py

cd /tmp/pkg
python -m pip_run

# 1: this is pip_run
python -c 'print(__import__("importlib").import_module("pip_run"))'
# <module 'pip_run' (namespace) from ['/tmp/pkg/pip_run']>

python -m pip-run
# 2: this is pip-run
python -c 'print(__import__("importlib").import_module("pip-run"))'
# <module 'pip-run' (namespace) from ['/tmp/pkg/pip-run']>

(not saying that this is a good or bad thing, just illustrating existing behaviour).

(Sorry for leaning on pip-run. It's the example from the original FR.)

So, let's assume a hypothetical package wants to offer a "normal" API access. It would likely have pkg_name/py.typed or pkg_name/__init__.pyi.

If the same package also want to have a convenient CLI shortcut via -m, it might want to include pkg-name/py.typed or pkg-name/__init__.pyi. In practice, projects may choose to have all four files coexisting inside a single wheel.


At the end of the day allowing - or applying normalisation would be mutually exclusive options. Since setuptools and validate-pyproject do allow - (as initially requested in the pip-run-related FR), I think it make sense to keep following that line, and use consistent validation for package names in packages = [...] and package-data = { ... } in pyproject.toml.

abravalheri avatar Nov 28 '25 12:11 abravalheri

Thanks for the detailed explanation! It seems I've got some things messed up along the way. If I understand you correctly, your suggestion is to reuse the package-name definition, which is already used for packages and package-dir, here. Correct?

Just pushed a new commit for it.

cdce8p avatar Nov 29 '25 01:11 cdce8p