copier icon indicating copy to clipboard operation
copier copied to clipboard

Formalize copier.yml schema (or use JSON schema)

Open bollwyvl opened this issue 2 years ago • 8 comments

Is your feature request related to a problem? Please describe.

I'd like to be able to validate a copier.yml without loading it in copier.

Describe the solution you'd like

Create a normalized, versioned JSON schema, checked into the repo, and available for tools. It could be authored in yaml, but having it checked in as actual JSON, somewhere, is important for downstreams.

Use this as an up-front validator when parsing a file.

Describe alternatives you've considered

Optionally, allow opting a copier.yml into being a JSON Schema itself, as there is already robust tooling for this in many languages. A number of features (oneOf, default) are already handled by JSON schema. Templating (though conditionals schema, if, and other things are possible).

Additional context

bollwyvl avatar Jan 14 '23 16:01 bollwyvl

I was actually thinking about something similar a while ago. I was looking for IDE autocompletion for copier.yml in VS Code which can be done via JSON schema.

Copier uses Pydantic in some places although (I think) a bit inconsistently. I've quietly been working on refactoring the user_data.Question class into several Pydantic models to get a proper/detailed domain model of the copier.yml content (which may still contain Jinja templates) as well as the rendered content. The Pydantic model of the copier.yml file could then be used to validate the file upon loading it into Copier, and it can also be used to generate a JSON schema, making it the single source of truth.

What I've been working on doesn't work yet although I've come quite far already. When I've made enough progress, I'll send a PR, but I can't provide a timeline.

sisp avatar Jan 23 '23 15:01 sisp

Yeah, pydantic is... fine. But indeed, the language servers definitely like proper schema, so actually shipping and using a versioned schema at runtime is useful in its own right.

But, having schema exist external to Pydantic has certain benefits, such as being able to use hypothesis-jsonschema to test the pydantic implementation. And pydantic has some... quirks in its generation of schema, for sure.

Having been down the "iteratively apply schema for templated yaml" (and/or, the much worse YAML tagged/magic comment): there's not a lot you can do but intercept right after (ruamel_)yaml.(safe_)load.

There's also some prior (but uncoordinated) art for heavy usage of the above patterns, such as:

But again: having an (alternate) input that actually is JSON schema (or something that generates a JSON schema of a template) would allow for reusing many existing tools, both for template authors or template users. For example, I've wrapped the cookiecutter-almost-schema, and it works... pretty well. Looking to do so for copier as well!

bollwyvl avatar Jan 23 '23 15:01 bollwyvl

Yes, Pydantic is certainly not perfect at the moment. The core team is working on v2 with many improvements though.

For REST/HTTP API development, I'm a big fan of writing the OpenAPI spec explicitly (i.e. spec-first approach) rather than using, e.g., Pydantic that can generate the schema/spec for me. But with the increasing of adoption of type hints in Python, I think there's some merit in using Pydantic because Pydantic models are (typically) properly typed. If a JSON schema is written manually, then Python types either need to be written (and kept in sync) manually or generated (which still lacks good tools, I think).

sisp avatar Jan 23 '23 15:01 sisp

How about this for starters?

Definitely not perfect, I've only worked with openapi 3 so far.

$schema: "https://json-schema.org/draft/2020-12/schema"
$id: "https://example.com/product.schema.json"
title: Copier config
type: object
properties:
  _exclude:
    type: array
    items:
      type: string
  _answers_file:
    type: string
  _envops:
    type: object
  _jinja_extensions:
    type: array
    items:
      type: string
  _min_copier_version:
    type: string
  _secret_questions:
    type: array
    items:
      type: string
  _skip_if_exists:
    type: array
    items:
      type: string
  _subdirectory:
    type: string
  _migrations:
    type: array
    items:
      type: object
      properties:
        version:
          type: string
        before:
          type: array
          items:
            type: string
        after:
          type: array
          items:
            type: string
  _templates_suffix:
    type: string
    default: .jinja
additionalProperties:
  oneOf:
  - type: string
  - type: object
    properties:
      type:
        type: string
        enum:
        - bool
        - float
        - int
        - json
        - string
        - yaml
        default: yaml
      help:
        type: string
      choices:
        type: array
        items:
          type: object
      default: {}
      secret:
        type: boolean
      placeholder: {}
      multiline:
        type: bool
        default: false
      validator:
        type: string
      when:
        type: string
    additionalProperties: false

rafalkrupinski avatar Jan 24 '23 01:01 rafalkrupinski

I think it's a good starting point. With little more effort, _envops could be specified in more detail. In addition, I think some properties should be declared as required.

You've referenced the JSON schema Draft 2020-12 which has nice features but isn't fully supported by common libraries (e.g. python-jsonschema as partial support, ajv has full support).

Related to the JSON schema version, I believe JSON Schema Store is a widely adopted and quite comprehensive collection of JSON schemas of well known JSON (and compatible) files. For instance, VS Code uses it for autocompletion. After a quick scan of the repo, I see that many JSON schemas use Draft 7 or 4 which are well supported by common libraries. So I'd probably stick with one of those, probably Draft 7 to be able to use at least some more recent features.

By the way, what do you think about contributing a JSON schema for copier.yml to the JSON Schema Store project? If it gets accepted, VS Code should automatically offer autocompletion for copier.yml after a while.

sisp avatar Jan 24 '23 09:01 sisp

VS Code should

I recommend not conflating "tooling" with "VSCode plus third-party services": for example, allowing a template to include a $schema, pointing at an organically-hosted, versioned schema on e.g. the documentation website would remove much of the "magic," and many more tools would support it.

probably Draft 7

Yep, Draft 7 is a fine place to start, as already includes if and other things that might be relevant.

contributing a JSON schema for copier.yml to the JSON Schema Store

I have no experience contributing to that site... but have found places where version mismatches there create real problems when tools naively consume it.

At any rate: if firmly committed to doing it the pydantic way, it likely makes sense to encode it as "dumb" (i.e. no methods) pydantic classes, where the docstings and field annotations would make it all the way out, and out to generated documentation, as the human-readable text is just as important as the

bollwyvl avatar Jan 24 '23 15:01 bollwyvl

I'd be glad to dump pydantic if JSON schema replaces it for all the needed use cases. Copier was already using it when I started maintaining it, so I just kept on using it. That said, I have no plans or will to dump it myself because, basically, it's working. But a refactor that removes pydantic, reduces code and still leaves the test suite fully ✅ would be nice.

If you, like me, have no plans on doing so, then in any case the JSON schema thing would be nice!

Just, as all in copier, please make sure it's properly tested. Thanks everyone!

yajo avatar Jan 26 '23 19:01 yajo

Yeah, there's definitely nothing wrong with pydantic, and it's a great way to realize a lot of value. Some paths to leveraging it:

  • have a copier schema command that prints out a top-level schema of all of the pydantic classes
    • since yaml is around, go ahead and maybe offer a --yaml output
  • include the output of this in the built package-as-shipped, as well
  • use it to drive docs with e.g. mkdocs-json-schema-plugin
    • as well as have the JSON itself at a "canonical" URL

bollwyvl avatar Jan 31 '23 18:01 bollwyvl