ytt icon indicating copy to clipboard operation
ytt copied to clipboard

[schema, data/values] support specifying schema for data structures

Open cppforlife opened this issue 4 years ago • 17 comments

motivating use cases:

  • allow to provide schema for data values
    • so that default values, types, descriptions, validations, examples could be provided
  • allow to provide schema for documents (eg k8s resources) matching particular pattern (eg apiVersion/kind keys [1])
    • so that ytt can be helpful when building data structures (eg checking key presence, validating values, etc.)

additional wants:

  • allow to attach single schema to multiple documents
  • allow to "package" schema into a ytt library (eg k8s-schema-lib) so that it could be imported

opens possibilities to:

  • have overlay module rely on schema for more seamless "overlaying" behavior (eg key structure, defaults?, merging type)
  • editor integration to support better error highlighting etc
  • generate UI for data value configuration

possible backing mechanisms:

  • json schema / openapi
    • widely used including in k8s
  • ytt annotation based
    • waay more concise

[1] https://github.com/garethr/kubernetes-json-schema/blob/8aa572595b98d73b2b9415ca576f78e163381b10/v1.9.9-standalone-strict/cronjob.json#L4-L10

cppforlife avatar Mar 04 '20 05:03 cppforlife

proposal work happening here: https://github.com/k14s/design-docs/tree/develop/ytt/001-schemas

cppforlife avatar Mar 05 '20 22:03 cppforlife

TBD:

  • [ ] how to add more things to schema? (via overlays similar to data values)
  • [ ] provide programmatic schema.apply() (similar to overlay.apply)
  • [ ] respect schema for data values set via cmd line flags/env vars
  • [ ] add --data-values-schema-inspect (similar to --data-values-inspect)
    • generate html view via builtin server

cppforlife avatar Apr 03 '20 16:04 cppforlife

This sounds similiar to Validating Helm Chart Values with JSON Schemas

jessehu avatar Apr 05 '20 06:04 jessehu

Happy to see a conversation about validation!

In order to annotate a value with validation rules, the value needs to be present. This has the side effect of needing to specify a defaults for every value, even those that don't have a sensible default.

Let's assume there's a property called magicNumber that must be even but has no sensible default. The values file might look like:

#@schema/validate number_is_even
replicas: ~

It's not a huge deal, but I find this to be a bit awkward. I mention this because:

  • I find it surprising that I can't describe how to validate a value without specifying a default for it.
  • There are other areas of ytt that are awkward/surprising (comments for example). From an overall design perspective, is there a point where there's too many surprises?

What are the benefits of the design using annotations to define a schema?

shamus avatar Apr 06 '20 15:04 shamus

Hey @shamus, thanks for the feedback!

The intention is for schema annotations to be used in a separate file from the data/values docs (i.e. schema.yml defaults.yml) allowing the separation you asked about. That being said, the schema would still need to include a zero value for the key:

#@schema/validate number_is_even
replicas: 0

In this case, ytt will infer the type of replicas as an integer and default it to 0 (similar to how go uses zero values for variables). This allows authors to reduce to verbosity of schemas and gives the readers greater focus on structure. We would also like to keep typing and validations as separate levels in a schema:

  1. schema author specifies a value (or the 0 value) which sets the required type as well
  2. schema author specifies validations which may error given the default value, requiring user input.

We are considering allowing the nil value to be a stand-in for the 0 value if a type is specified to hopefully reduce confusion when reading a schema. For example,

#@schema/type "int"
replicas: 0

could become:

#@schema/type "int"
replicas:

Which would default replicas to 0 and require users to specify a value in a values file, unless the #@schema/allow-empty annotation is present.

Please let us know what you think!

ewrenn8 avatar Apr 06 '20 19:04 ewrenn8

Hi from the UAA team! I'll be looking at this and get back to you soon. I'm definitely open to a meeting or just leaving comments here - I'll leave the venue up to you.

joshuatcasey avatar Apr 09 '20 16:04 joshuatcasey

I'm curious about a few things.

How will the user specify the schema at template render time? ytt -f my-schema.yml -f my-values.yml -f my-template.yml? What happens if the user does not specify the schema file?

I can foresee that we might end up specifying additional pieces of functionality via new template files. Can different template files construct their own schemas such that the schemas don't have to overlay each other? I don't want to have to construct a schema as an overlay just in case there's another schema prior to it in the ytt file list.

Does the sequence of events described in proposal 001 indicate that we can assume that a value marked as required by the schema will always have a value? This would be useful to the UAA because we end up having to say things like #@ data.values.foobar or assert.fail("foobar is required") in multiple locations.

joshuatcasey avatar Apr 16 '20 04:04 joshuatcasey

How will the user specify the schema at template render time? ytt -f my-schema.yml -f my-values.yml -f my-template.yml? What happens if the user does not specify the schema file?

currently ytt will happily continue on without schema (and its defaults) with given data values.

Can different template files construct their own schemas such that the schemas don't have to overlay each other? I don't want to have to construct a schema as an overlay just in case there's another schema prior to it in the ytt file list.

i would still recommend to have "one" schema file similar to having one data values file that describes all data values allowed in the set of templates. but yeah, we were also discussing allowing to provide multiple non-overlapping schemas and get them merged. for overlapping items we will probably require overlay annotations.

Does the sequence of events described in proposal 001 indicate that we can assume that a value marked as required by the schema will always have a value? This would be useful to the UAA because we end up having to say things like #@ data.values.foobar or assert.fail("foobar is required") in multiple locations.

right, we are thinking so far that if something is declared it would always be there unless explicitly annotated with @key-may-be-present (or something like that). validations like the one you mention would be done by default unless @allow-empty annotation is added.

cppforlife avatar Apr 16 '20 23:04 cppforlife

I wonder if there's a way for a template to express its schema. If I'm building a template, I would know what values are required for my template to render appropriately, and I should probably express my schema in that template. It feels odd that an operator could bypass my schema simply by not including it via ytt -f.

Also, I'm curious if there's a way to indicate a disallowed or deprecated option. Let's say we decided to rename database.user to database.username. Since database.username isn't required the operator might provide database.user and think they doing the right thing. It would be nice to give feedback such as database.user is no longer allowed. please use database.username instead.

joshuatcasey avatar May 05 '20 21:05 joshuatcasey

I should probably express my schema in that template.

in a lot of cases multiple templates use same set of data values. im not sure how we would avoid duplication.

Also, I'm curious if there's a way to indicate a disallowed or deprecated option

cool suggestion. adding two annotations like @schema/removed and @schema/deprecated would definitely be a possibility.

cppforlife avatar May 06 '20 00:05 cppforlife

in a lot of cases multiple templates use same set of data values. im not sure how we would avoid duplication.

The way I was thinking of it was that a lot of the UAA templates that work only all together would probably each specify the same schema file so that we don't duplicate the schema.

joshuatcasey avatar May 06 '20 03:05 joshuatcasey

@cppforlife I like that you mentioned OpenAPI. I would like to express my support for an OpenAPI backend as it will enable us to use the huge ecosystem around this standard (instead of creating a custom schema format, regardless of the succinctness).

OpenAPI is used beyond Kubernetes for a reason. We can build upon it, which I believe is the spirit of the k14s suite of tools.

Use cases:

  • Schema diff
  • Editor support
  • Ready, hardened libraries
  • Documentation generators

I strongly prefer this to a concise language.

ciriarte avatar Aug 11 '20 21:08 ciriarte

+1 for @ciriarte comment. I would also like to see this backed by OpenAPI so that we can take advantage of the already existing tooling that is out there.

I was specifically thinking in terms of diff'ing versions of schema's in order to detect backward incompatible changes and help with semantic versioning.

paulcwarren avatar Aug 12 '20 00:08 paulcwarren

one thing we were considering is having a way to get openapi/jsonschema out of ytt schema. this is similar to how folks build openapi k8s schema for crds based on golang struct types.

cppforlife avatar Aug 14 '20 17:08 cppforlife

That would work in one direction, what about the inverse? From OpenAPI to ytt schema? Do you have some ideas on how we could leverage the existing tooling?

ciriarte avatar Aug 14 '20 18:08 ciriarte

From OpenAPI to ytt schema?

doable as well.

Do you have some ideas on how we could leverage the existing tooling?

most of the tooling in openapi ecosystem is centered about REST APIs since that's what it is designed to describe. ytt data values isnt a REST API so it does not really translate.

cppforlife avatar Aug 14 '20 18:08 cppforlife

ytt data values isnt a REST API so it does not really translate.

I agree that it may look so at first look. My argument is that ytt data values are data, and data should be independent from its representation/location (yaml docs in disk vs yaml/json in http), and OpenAPI describes the schema of those documents.

ciriarte avatar Aug 15 '20 14:08 ciriarte