copier
copier copied to clipboard
Formalize copier.yml schema (or use JSON schema)
Is your feature request related to a problem? Please describe.
I'd like to be able to validate a copier.yml without loading it in copier.
Describe the solution you'd like
Create a normalized, versioned JSON schema, checked into the repo, and available for tools. It could be authored in yaml, but having it checked in as actual JSON, somewhere, is important for downstreams.
Use this as an up-front validator when parsing a file.
Describe alternatives you've considered
Optionally, allow opting a copier.yml into being a JSON Schema itself, as there is already robust tooling for this in many languages. A number of features (oneOf, default) are already handled by JSON schema. Templating (though conditionals schema, if, and other things are possible).
Additional context
I was actually thinking about something similar a while ago. I was looking for IDE autocompletion for copier.yml in VS Code which can be done via JSON schema.
Copier uses Pydantic in some places although (I think) a bit inconsistently. I've quietly been working on refactoring the user_data.Question class into several Pydantic models to get a proper/detailed domain model of the copier.yml content (which may still contain Jinja templates) as well as the rendered content. The Pydantic model of the copier.yml file could then be used to validate the file upon loading it into Copier, and it can also be used to generate a JSON schema, making it the single source of truth.
What I've been working on doesn't work yet although I've come quite far already. When I've made enough progress, I'll send a PR, but I can't provide a timeline.
Yeah, pydantic is... fine. But indeed, the language servers definitely like proper schema, so actually shipping and using a versioned schema at runtime is useful in its own right.
But, having schema exist external to Pydantic has certain benefits, such as being able to use hypothesis-jsonschema to test the pydantic implementation. And pydantic has some... quirks in its generation of schema, for sure.
Having been down the "iteratively apply schema for templated yaml" (and/or, the much worse YAML tagged/magic comment): there's not a lot you can do but intercept right after (ruamel_)yaml.(safe_)load.
There's also some prior (but uncoordinated) art for heavy usage of the above patterns, such as:
But again: having an (alternate) input that actually is JSON schema (or something that generates a JSON schema of a template) would allow for reusing many existing tools, both for template authors or template users. For example, I've wrapped the cookiecutter-almost-schema, and it works... pretty well. Looking to do so for copier as well!
Yes, Pydantic is certainly not perfect at the moment. The core team is working on v2 with many improvements though.
For REST/HTTP API development, I'm a big fan of writing the OpenAPI spec explicitly (i.e. spec-first approach) rather than using, e.g., Pydantic that can generate the schema/spec for me. But with the increasing of adoption of type hints in Python, I think there's some merit in using Pydantic because Pydantic models are (typically) properly typed. If a JSON schema is written manually, then Python types either need to be written (and kept in sync) manually or generated (which still lacks good tools, I think).
How about this for starters?
Definitely not perfect, I've only worked with openapi 3 so far.
$schema: "https://json-schema.org/draft/2020-12/schema"
$id: "https://example.com/product.schema.json"
title: Copier config
type: object
properties:
_exclude:
type: array
items:
type: string
_answers_file:
type: string
_envops:
type: object
_jinja_extensions:
type: array
items:
type: string
_min_copier_version:
type: string
_secret_questions:
type: array
items:
type: string
_skip_if_exists:
type: array
items:
type: string
_subdirectory:
type: string
_migrations:
type: array
items:
type: object
properties:
version:
type: string
before:
type: array
items:
type: string
after:
type: array
items:
type: string
_templates_suffix:
type: string
default: .jinja
additionalProperties:
oneOf:
- type: string
- type: object
properties:
type:
type: string
enum:
- bool
- float
- int
- json
- string
- yaml
default: yaml
help:
type: string
choices:
type: array
items:
type: object
default: {}
secret:
type: boolean
placeholder: {}
multiline:
type: bool
default: false
validator:
type: string
when:
type: string
additionalProperties: false
I think it's a good starting point. With little more effort, _envops could be specified in more detail. In addition, I think some properties should be declared as required.
You've referenced the JSON schema Draft 2020-12 which has nice features but isn't fully supported by common libraries (e.g. python-jsonschema as partial support, ajv has full support).
Related to the JSON schema version, I believe JSON Schema Store is a widely adopted and quite comprehensive collection of JSON schemas of well known JSON (and compatible) files. For instance, VS Code uses it for autocompletion. After a quick scan of the repo, I see that many JSON schemas use Draft 7 or 4 which are well supported by common libraries. So I'd probably stick with one of those, probably Draft 7 to be able to use at least some more recent features.
By the way, what do you think about contributing a JSON schema for copier.yml to the JSON Schema Store project? If it gets accepted, VS Code should automatically offer autocompletion for copier.yml after a while.
VS Code should
I recommend not conflating "tooling" with "VSCode plus third-party services": for example, allowing a template to include a $schema, pointing at an organically-hosted, versioned schema on e.g. the documentation website would remove much of the "magic," and many more tools would support it.
probably Draft 7
Yep, Draft 7 is a fine place to start, as already includes if and other things that might be relevant.
contributing a JSON schema for copier.yml to the JSON Schema Store
I have no experience contributing to that site... but have found places where version mismatches there create real problems when tools naively consume it.
At any rate: if firmly committed to doing it the pydantic way, it likely makes sense to encode it as "dumb" (i.e. no methods) pydantic classes, where the docstings and field annotations would make it all the way out, and out to generated documentation, as the human-readable text is just as important as the
I'd be glad to dump pydantic if JSON schema replaces it for all the needed use cases. Copier was already using it when I started maintaining it, so I just kept on using it. That said, I have no plans or will to dump it myself because, basically, it's working. But a refactor that removes pydantic, reduces code and still leaves the test suite fully ✅ would be nice.
If you, like me, have no plans on doing so, then in any case the JSON schema thing would be nice!
Just, as all in copier, please make sure it's properly tested. Thanks everyone!
Yeah, there's definitely nothing wrong with pydantic, and it's a great way to realize a lot of value. Some paths to leveraging it:
- have a
copier schemacommand that prints out a top-level schema of all of the pydantic classes- since yaml is around, go ahead and maybe offer a
--yamloutput
- since yaml is around, go ahead and maybe offer a
- include the output of this in the built package-as-shipped, as well
- use it to drive docs with e.g. mkdocs-json-schema-plugin
- as well as have the JSON itself at a "canonical" URL