cue icon indicating copy to clipboard operation
cue copied to clipboard

encoding/jsonschema: create definitions for all nested schemas

Open cueckoo opened this issue 4 years ago • 4 comments

Originally opened by @myitcv in https://github.com/cuelang/cue/issues/390

Is your feature request related to a problem? Please describe.

Per discussion with @mpvl. cc @proppy given our previous exchange on this topic.

As part of 65163a0036b835cd6aa8894f1e91732ba85e99cd we flipped to use CUE for this repository's GitHub Actions Workflow specifications. As you can see from that commit, this change relies on the GitHub-defined workflow JSON schema.

Within this schema is a "jobs" field of type struct that is defined using patternProperties:

...
    "jobs": {
      "$comment": "https://help.github.com/en/github/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#jobs",
      "description": "A workflow run is made up of one or more jobs. Jobs run in parallel by default. To run jobs sequentially, you can define dependencies on other jobs using the jobs.<job_id>.needs keyword.\nEach job runs in a fresh instance of the virtual environment specified by runs-on.\nYou can run an unlimited number of jobs as long as you are within the workflow usage limits. For more information, see https://help.github.com/en/github/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#usage-limits.",
      "type": "object",
      "patternProperties": {
        "^[_a-zA-Z][a-zA-Z0-9_-]*$": {
...

This gets translated to:

https://github.com/cuelang/cue/blob/8fcefc84bf1d6868608beb7680a916f57181d9ed/cue.mod/pkg/github.com/SchemaStore/schemastore/schemas/json/github-workflow.cue#L306-L307

The issue is that it is now very tricky to refer to the schema that defines the value that can appear in fields in this "jobs" struct:

package x

import "github.com/SchemaStore/schemastore/schemas/json"

#job: (json.Workflow.jobs & {x: _}).x
#step: ((#job & {steps: _}).steps & [_])[0]

myJob: #job & {
	name: "myJob"
	"runs-on": "ubuntu-latest"
}

myStep: #step & {
	name: "myStep"
	if: "ok"
}

Full repro:

https://gist.github.com/myitcv/4ef3d99aea77882ac460e3aaf76622c3

Output is:

> exec cue eval -c .
[stdout]
myJob: {
    name:      "myJob"
    "runs-on": "ubuntu-latest"
}
myStep: {
    name: "myStep"
    if:   "ok"
}

Describe the solution you'd like

Something that looks like this:

package x

import "github.com/SchemaStore/schemastore/schemas/json"

myJob: json.Workflow.jobs.#job & {
	name: "myJob"
	"runs-on": "ubuntu-latest"
}

myStep: json.Workflow.jobs.#job.steps.#step & {
	name: "myStep"
	if: "ok"
}

i.e. the schemas used within jobs and steps to be defined inline, and easily reference-able.

Describe alternatives you've considered

There is a workaround... but it's not pretty :)

Additional context

n/a

cueckoo avatar Jul 03 '21 10:07 cueckoo

Original reply by @proppy in https://github.com/cuelang/cue/issues/390#issuecomment-630139275

In:

json.Workflow.jobs.#job

Where would the #job comes from? (since it wouldn't be defined in the underlying jsonschema).

I wonder if [], [...] or even [_] would be acceptable as a way to express querying on any valid field of a srruct/list without causing clash for other syntax?

myStep: json.Workflow.jobs[].steps[] & { }

cueckoo avatar Jul 03 '21 10:07 cueckoo

Original reply by @myitcv in https://github.com/cuelang/cue/issues/390#issuecomment-630571776

Where would the #job comes from? (since it wouldn't be defined in the underlying jsonschema).

It was just a very loose idea to illustrate that we need to be able to address these schemas/constraints in some way.

@rogpeppe also pointed out that we should be able to address the field name constraint too, i.e.:

=~"^[_a-zA-Z][a-zA-Z0-9_-]*$" & !~"^()$"

cueckoo avatar Jul 03 '21 10:07 cueckoo

I believe I'm struggling with an issue related to this and validation against the github-workflow.cue as well. Similar to the field name constraint, the github.#Workflow.#jobNeeds are imported from the JSON Schema as:

#name: =~"^[_a-zA-Z][a-zA-Z0-9_-]*$"
#jobNeeds: [...#name] & [_, ...] | #name

Using a concrete list of names works just fine, but if I try to generate a list of all existing job names:

let allJobIds = [ for k, _ in jobs if k != "merge_queue" {k}]

...and then use it in a job's needs: allJobIds field, it also hits a mismatch error that I don't think is accurate:

_#useMergeQueue.jobs.merge_queue: 3 errors in empty disjunction:
_#useMergeQueue.jobs.merge_queue.needs: 2 errors in empty disjunction:
_#useMergeQueue.jobs.merge_queue.needs: conflicting values =~"^[_a-zA-Z][a-zA-Z0-9_-]*$" and [for k, _ in jobs if (k != "merge_queue") {k}] (mismatched types string and list):
    ./cue.mod/pkg/json.schemastore.org/github/github-workflow.cue:590:9
    ./workflows.cue:40:18
    ./workflows.cue:45:14
_#useMergeQueue.jobs.merge_queue.needs: incompatible list lengths (0 and 2)

Here's my repo for reference, although this is the working version where I manually specify the job names, but I'd like to replace this line in particular with the above:

https://github.com/EarthmanMuons/rustops-blueprint/blob/6a4379a647358e90fd06d86d54e8479e8e799849/.github/cue/workflows.cue#L42

elasticdog avatar May 21 '23 22:05 elasticdog

Note: in JSON Schema itself, it's possible to address subschemas by their position in the JSON using a URL fragment containing a JSON Pointer. For example, {"$ref": "#/definitions/normalJob/properties/steps/items"}. Solving this issue by creating definitions for all nested schemas, probably with keys that are easily mapped to from JSON Pointer references, would make possible to support that kind of reference too.

rogpeppe avatar Oct 15 '24 11:10 rogpeppe