Allow package-data for stubs packages
I'm currently working on moving the stubs_uploader from setup.py to pyproject.toml. It's used to build and upload the typeshed stubs packages automatically. While doing so I noticed that setuptools rejects stub package names like requests-stubs for package-data. I know that .pyi and py.typed files are auto-included. Each stubs package also contains a METADATA.toml file which should be included in the distribution. Do to the nature of the project, it would be desirable to use include-package-data = false and instead specify the wanted files directly.
This PR slightly modifies the schema to allow for that. If accepted, it would be great if a new version could be released so I include these changes in setuptools.
https://github.com/typeshed-internal/stub_uploader/blob/cbdc0b85972e476b4dedbd626cd12dadf7dd60a9/stub_uploader/build_wheel.py#L51-L84
/CC @abravalheri
Sorry for the delay.
I think that this is very reasonable. Can we somehow re-use the existing definition?
https://github.com/abravalheri/validate-pyproject/blob/v0.24.1/src/validate_pyproject/plugins/setuptools.schema.json#L251-L261?
If accepted, it would be great if a new version could be released so I include these changes in setuptools.
Sure, it may take a little bit however, I want to wait for the situation to stabilise.
Sorry for the late reply. I somehow lost track of it.
I think that this is very reasonable. Can we somehow re-use the existing definition?
https://github.com/abravalheri/validate-pyproject/blob/v0.24.1/src/validate_pyproject/plugins/setuptools.schema.json#L251-L261?
Unless I'm missing something, it isn't really possible unfortunately. While the package name itself is normalized by setuptools, I don't think the name in the package-data dict is. Thus the best we can probably do is add a separate validation function for it to reduce duplications. See https://github.com/abravalheri/validate-pyproject/pull/248/commits/80a6d7dac474908dd43f2fd418e666cd8c3663de
@abravalheri Did you had a chance to look at this again?
Hi @cdce8p, sorry for the delay in getting back to this.
Unless I'm missing something, it isn't really possible unfortunately. While the package name itself is normalized by setuptools, I don't think the name in the
package-datadict is. Thus the best we can probably do is add a separate validation function for it to reduce duplications.
I am not sure I follow. I tested the following snippet for package-data and exclude-package-data on top of https://github.com/abravalheri/validate-pyproject/pull/248/commits/4b7244937eea1a5bcd1c64377920009a19e14dc0 and the added tests seem to pass as expected[^1]:
"anyOf": [
{ "$ref": "#/definitions/package-name" },
{ "const": "*" }
]
Note that #/definitions/package-name already includes the *-stubs case, so that part is covered.
As for the python-module-name-relaxed format used in the shared definition, I believe it’s consistent with how we validate package and package-dir, so it should be fine. We could technically abstract this further into a new definition in the JSON Schema to reduce repetition even more, but I haven’t found a good name to abstract such a concept, so I think the snippet above is already a reasonable compromise[^2].
Personally, I’d prefer keeping this logic in the JSON Schema itself rather than introducing another format function. If you’re okay with that, I can push my local copy with these changes.
[^1]: The only error I got testing locally on 3.12 was FAILED tests/test_pre_compile.py::test_invalid_examples_api[store/ruff-unknown.toml-cli_pre_compile] - urllib.error.URLError: <urlopen error [Errno 11] Resource temporarily unavailable> which is unrelated to the schema changes.
[^2]: Maybe python-module-name-relaxed-or-wildcard (based on your suggested function)... It is a big name, but it does a good job.
I am not sure I follow. I tested the following snippet for
package-dataandexclude-package-dataon top of 4b72449 and the added tests seem to pass as expected:"anyOf": [ { "$ref": "#/definitions/package-name" }, { "const": "*" } ]Note that
#/definitions/package-namealready includes the*-stubscase, so that part is covered.As for the
python-module-name-relaxedformat used in the shared definition, I believe it’s consistent with how we validatepackageandpackage-dir, so it should be fine.
Not quite, the point I'm worried about is that the package name itself is normalized by setuptools, however the package-data module name isn't. Even with python_module_name_relaxed aborting early for stubs packages, this would still allow invalid names for modules inside the package-data dict. E.g.
[project]
name = "some-project"
[tool.setuptools.package-data]
"some-project" = ["*.yaml"]
Files stored in some_project/.... Validation would succeed while the data files are not included.
Any update?
Sorry for the delay,
```toml [project] name = "some-project" [tool.setuptools.package-data] "some-project" = ["*.yaml"]Files stored in some_project/.... Validation would succeed while the data files are not included.
So if I understand correctly, the error you are concerned about is when the person writes some-project but the actual directory uses underscore instead of dashes (or vice versa)...
This is a tricky one... but since both variations are allowed to coexist and they are both individually considered valid packages, I don't think we can enforce that distinction. This was a feature request from the setuptools issue tracker (if I am not wrong), previously only _ were allowed.
Now both some-project or some_project are allowed to exist, so they both can have data files inside of them...
So if I understand correctly, the error you are concerned about is when the person writes
some-projectbut the actual directory uses underscore instead of dashes (or vice versa)...
Correct.
This is a tricky one... but since both variations are allowed to coexist and they are both individually considered valid packages, I don't think we can enforce that distinction. This was a feature request from the setuptools issue tracker (if I am not wrong), previously only
_were allowed.Now both
some-projectorsome_projectare allowed to exist, so they both can have data files inside of them...
I think of it differently. On is the project name itself whereas the other references the directory. So I think the distinction does make sense. Unless I'm missing something, there are basically two options now
- Leave the PR as is. This enforces that the name in
package-datamatches the directory exactly (without normalization). This is how setuptools handles it currently. The PR would only allow the added-stubssuffix. - Or we could relax the validation for package-data names. In that case though I believe we should definitely add the normalization to setuptools. Otherwise users might encounter situations where package files aren't included when they should have been, without any errors from setuptools.
Personally, I'd recommend option 1. If there is a desire for package-data normalization, this could be added separately later.
Thanks for the clarification. I think we have some partial alignment, but I would like to refine one point for accuracy (and comment on its consequences):
This enforces that the name in package-data matches the directory exactly (without normalization). This is how setuptools handles it currently.
This statement seem to imply that "setuptools enforces the name to match the directory", which would not be entirely precise. What setuptools actually does is: if the directory name matches the package name in the configuration (exactly, without normalisation), then the files are included. If they don't match, setuptools simply doesn't include them. It doesn't actively enforce the match beyond that behaviour.
It's also worth noting that setuptools allows two distinct package names that differ only by _ vs -. For example, pip-run and pip_run are both valid and treated as separate packages, each capable of having its own package data.
This flexibility was introduced because of a feature request from setuptools: previously, validate-pyproject enforced normalisation to _ (since identifiers in import statements require underscores). But the FR highlighted real-world use cases where dashes are useful/desired: pip-run being the canonical example. Both importlib.import_module and runpy.run_module handle names with dashes correctly, so the request was accepted.
Here's a quick demo showing Python treating them as distinct:
# docker run --rm -it python:3.12-bookworm /bin/bash
mkdir -p /tmp/pkg/pip_run
echo "print('1: this is pip_run')" > /tmp/pkg/pip_run/__main__.py
mkdir -p /tmp/pkg/pip-run
echo "print('2: this is pip-run')" > /tmp/pkg/pip-run/__main__.py
cd /tmp/pkg
python -m pip_run
# 1: this is pip_run
python -c 'print(__import__("importlib").import_module("pip_run"))'
# <module 'pip_run' (namespace) from ['/tmp/pkg/pip_run']>
python -m pip-run
# 2: this is pip-run
python -c 'print(__import__("importlib").import_module("pip-run"))'
# <module 'pip-run' (namespace) from ['/tmp/pkg/pip-run']>
(not saying that this is a good or bad thing, just illustrating existing behaviour).
(Sorry for leaning on pip-run. It's the example from the original FR.)
So, let's assume a hypothetical package wants to offer a "normal" API access. It would likely have pkg_name/py.typed or pkg_name/__init__.pyi.
If the same package also want to have a convenient CLI shortcut via -m, it might want to include pkg-name/py.typed or pkg-name/__init__.pyi. In practice, projects may choose to have all four files coexisting inside a single wheel.
At the end of the day allowing - or applying normalisation would be mutually exclusive options. Since setuptools and validate-pyproject do allow - (as initially requested in the pip-run-related FR), I think it make sense to keep following that line, and use consistent validation for package names in packages = [...] and package-data = { ... } in pyproject.toml.
Thanks for the detailed explanation! It seems I've got some things messed up along the way. If I understand you correctly, your suggestion is to reuse the package-name definition, which is already used for packages and package-dir, here. Correct?
Just pushed a new commit for it.