cue icon indicating copy to clipboard operation
cue copied to clipboard

Support for Declaring Schema Reference in YAML for Validation Workflows

Open damienjburks opened this issue 7 months ago • 3 comments

Hello CUE team, As a maintainer for the FINOS Common Cloud Controls (CCC) project, I’d like to propose a feature that would enable YAML files to optionally declare the schema they are intended to validate against. This would greatly benefit projects like ours that use CUE to enforce compliance and quality controls our CI/CD pipelines.

Problem Statement

In our workflows, we use CUE to validate YAML files against defined schemas as part of pre-merge and pre-release checks. However, because YAML lacks a native way to declare which schema it adheres to, we are forced to hardcode schema mappings externally or apply the same schema across multiple files. This approach that doesn’t scale well and is error-prone for our use case.

Benefits

  • Enables schema-aware tooling and automation (e.g., schema selection in CI)
  • Improves clarity and maintainability of YAML-based control files
  • Aligns with similar concepts like $schema in JSON Schema

damienjburks avatar May 01 '25 21:05 damienjburks

What would this self-contained reference to a schema look like? I assume it would be a link to a local file or an HTTPS link?

What would the syntax be like? I assume some sort of special comment at the top of the file?

Is there any standard for this kind of thing that we can reuse?

mvdan avatar May 02 '25 07:05 mvdan

Hey :wave: I'm a co-maintainer with Damien, hope I can add some color...

As a project maintainer creating CI tooling that ingests YAML data, I would like to see a field in the YAML that tells me what schema the file adheres to, so that I can automate pre-merge and release workflows.

I'd imagine this is already possible, similar to the JSONSchema $schema field, but I haven't found anything in the docs.

A bonus would be if we can upgrade our tooling to point to schemas that are stored in the cue registry, so that we can always reference the authoritative source.

eddie-knight avatar May 02 '25 22:05 eddie-knight

Note that validating a YAML file directly against a schema on the central registry is already possible. For example, from https://cue.dev/docs/spotting-errors-earlier-github-actions-files/:

$ cue vet -c -d '#Workflow' cue.dev/x/githubactions@latest workflow.yml

I assume you want to somehow encode this as a special field in the YAML. I'm again unsure what the format of that should be; as far as I know, YAML in general doesn't have a standard for referencing a schema file or URL, like a $schema field. We could make up our own as CUE, but ideally we reuse existing standards.

Also note that a schema string field would be part of the YAML data, which may also be a problem for some use cases, as they might not want the extra field present when consuming the data.

mvdan avatar Jun 02 '25 09:06 mvdan

@eddie-knight just adding a brief reply here in the context of https://github.com/ossf/security-insights-spec/pull/139. If the spec is published to the Central Registry, we immediately gain a "well known location" for the schema that all CUE-aware tooling can resolve.

That would allow something like:

cue vet -d '#SecurityInsights' github.com/ossf/security-insights/spec@latest my-file.yml

So if we take that step, what we're considering here is whether we could so something like this at the top of a Yaml file:

$schema: "cue:github.com/ossf/security-insights/spec@latest#SecurityInsights"
...

myitcv avatar Jun 19 '25 09:06 myitcv