setuptools icon indicating copy to clipboard operation
setuptools copied to clipboard

_validate_pyproject has inconsistent provenance

Open jaraco opened this issue 1 year ago • 3 comments

The code in _validate_pyproject is generated, so includes vendored copies of the validate-pyproject project and its dependencies, but it doesn't follow the pattern for vendored dependencies, leading to confusion when users suggest changes to that code. It's unclear to me, even after reviewing the technique for generating the package, which code is generated and which code is copied (I haven't gone as far as to review the pre_compile logic).

I'd like to move this code out of setuptools or treat it like any other vendored dependency, for clarity and consistency.

In https://github.com/pypa/setuptools/issues/2825#issuecomment-1079633713, @mgorny proposed to create a new, distinct package that contains the generated code. That approach seems suitable to me. Something like setuptools-validate-pyproject. That dependency could then be vendored just like any other.

An alternative to consider would be to move the generated code into a separate git repo and use git submodules to link it in. That would create the separation, encapsulate the generated code, and provide a clear custodial trail (users couldn't link the code in this repo and wouldn't be inclined to provide PRs to it here), but it would still be integrated into the release (as it is today). I'm not confident about this approach or what pitfalls it might entail, so I'm inclined to focus on vendoring instead.

Another option could be to avoid the static generation and instead have validate-pyproject generate the code on demand. Then setuptools could simply depend on validate-pyproject or maybe validate-pyproject[setuptools,distutils] (again, vendored per Setuptools' vendoring scheme).

@abravalheri How do you feel about having a new, separate package for the validator?

jaraco avatar Jul 17 '24 14:07 jaraco

Thanks, Jason. I can have a look at this when I am back another week. The only difference here would be to distribute the precompiled code in a separated package if I understood correctly.

which code is generated and which code is copied (I haven't gone as far as to review the pre_compile logic).

Most code is generated. The only files that are copied are the formats.py file and exception related modules: https://github.com/abravalheri/validate-pyproject/blob/c150b1583781c735d9ed1d3878f79c97fee61d71/src/validate_pyproject/pre_compile/init.py#L50-53 (in my mind treating it as "mostly" vendorised is imprecise, a closer analogy are generators closer like protobuf).

abravalheri avatar Jul 18 '24 12:07 abravalheri

(in my mind treating it as "mostly" vendorised is imprecise, a closer analogy are generators closer like protobuf)

Thanks for the clarification. In that case, I feel less strongly about it, though I still think it would be nice to move the generated code into its own library or repo for better clarity. I'd like to explore the prospect of using a git submodule. I'll explore this possibility. Thanks for the insight and it's fine to consider this issue low priority.

jaraco avatar Jul 20 '24 13:07 jaraco

Hi @jaraco , sorry for the delay these days to reply (I am currently away from my computer).

My preferred approach is something like https://github.com/pypa/setuptools/pull/4364.

The reason is the following:

By collocating the JSON schemas inside the setuptools project, we make the development process more agile and more accessible for contributors.

Currently, if we want to add a new configuration or fix a bug, I need to do it in the validate-pyproject repository. This is a bit contrived.

I would still keep the compilation step and keep the generated artifacts in the source tree because they simplify dependencies and allow us to bypass the complication of vendoring (which would introduce transient dependencies).

abravalheri avatar Jul 20 '24 15:07 abravalheri

Does #4364 make it more clear what is the provenance of the _validate_pyproject folder?

I hope it separates the concerns that come from setuptools directly (e.g. the tool.setuptools folder) and the infrastructure to compile such validations coming from _validate_pyproject.

Since the setuptools project is free to evolve the structure of tool.setuptools and any other TOML table it supports, it make sense to me to keep the generated code and the original schemas collocated in the setuptools repository. It also makes the development more agile because any contributor now can modify the JSON schemas and run tox -e generate-validation-code to see instant results without having to wait for changes in validate-pyproject.

abravalheri avatar Aug 12 '24 16:08 abravalheri

Yes. That helps a lot. Let's see how that goes.

jaraco avatar Aug 12 '24 17:08 jaraco