encoding/jsonschema: create definitions for all nested schemas
Originally opened by @myitcv in https://github.com/cuelang/cue/issues/390
Is your feature request related to a problem? Please describe.
Per discussion with @mpvl. cc @proppy given our previous exchange on this topic.
As part of 65163a0036b835cd6aa8894f1e91732ba85e99cd we flipped to use CUE for this repository's GitHub Actions Workflow specifications. As you can see from that commit, this change relies on the GitHub-defined workflow JSON schema.
Within this schema is a "jobs" field of type struct that is defined using patternProperties:
...
"jobs": {
"$comment": "https://help.github.com/en/github/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#jobs",
"description": "A workflow run is made up of one or more jobs. Jobs run in parallel by default. To run jobs sequentially, you can define dependencies on other jobs using the jobs.<job_id>.needs keyword.\nEach job runs in a fresh instance of the virtual environment specified by runs-on.\nYou can run an unlimited number of jobs as long as you are within the workflow usage limits. For more information, see https://help.github.com/en/github/automating-your-workflow-with-github-actions/workflow-syntax-for-github-actions#usage-limits.",
"type": "object",
"patternProperties": {
"^[_a-zA-Z][a-zA-Z0-9_-]*$": {
...
This gets translated to:
https://github.com/cuelang/cue/blob/8fcefc84bf1d6868608beb7680a916f57181d9ed/cue.mod/pkg/github.com/SchemaStore/schemastore/schemas/json/github-workflow.cue#L306-L307
The issue is that it is now very tricky to refer to the schema that defines the value that can appear in fields in this "jobs" struct:
package x
import "github.com/SchemaStore/schemastore/schemas/json"
#job: (json.Workflow.jobs & {x: _}).x
#step: ((#job & {steps: _}).steps & [_])[0]
myJob: #job & {
name: "myJob"
"runs-on": "ubuntu-latest"
}
myStep: #step & {
name: "myStep"
if: "ok"
}
Full repro:
https://gist.github.com/myitcv/4ef3d99aea77882ac460e3aaf76622c3
Output is:
> exec cue eval -c .
[stdout]
myJob: {
name: "myJob"
"runs-on": "ubuntu-latest"
}
myStep: {
name: "myStep"
if: "ok"
}
Describe the solution you'd like
Something that looks like this:
package x
import "github.com/SchemaStore/schemastore/schemas/json"
myJob: json.Workflow.jobs.#job & {
name: "myJob"
"runs-on": "ubuntu-latest"
}
myStep: json.Workflow.jobs.#job.steps.#step & {
name: "myStep"
if: "ok"
}
i.e. the schemas used within jobs and steps to be defined inline, and easily reference-able.
Describe alternatives you've considered
There is a workaround... but it's not pretty :)
Additional context
n/a
Original reply by @proppy in https://github.com/cuelang/cue/issues/390#issuecomment-630139275
In:
json.Workflow.jobs.#job
Where would the #job comes from? (since it wouldn't be defined in the underlying jsonschema).
I wonder if [], [...] or even [_] would be acceptable as a way to express querying on any valid field of a srruct/list without causing clash for other syntax?
myStep: json.Workflow.jobs[].steps[] & { }
Original reply by @myitcv in https://github.com/cuelang/cue/issues/390#issuecomment-630571776
Where would the #job comes from? (since it wouldn't be defined in the underlying jsonschema).
It was just a very loose idea to illustrate that we need to be able to address these schemas/constraints in some way.
@rogpeppe also pointed out that we should be able to address the field name constraint too, i.e.:
=~"^[_a-zA-Z][a-zA-Z0-9_-]*$" & !~"^()$"
I believe I'm struggling with an issue related to this and validation against the github-workflow.cue as well. Similar to the field name constraint, the github.#Workflow.#jobNeeds are imported from the JSON Schema as:
#name: =~"^[_a-zA-Z][a-zA-Z0-9_-]*$"
#jobNeeds: [...#name] & [_, ...] | #name
Using a concrete list of names works just fine, but if I try to generate a list of all existing job names:
let allJobIds = [ for k, _ in jobs if k != "merge_queue" {k}]
...and then use it in a job's needs: allJobIds field, it also hits a mismatch error that I don't think is accurate:
_#useMergeQueue.jobs.merge_queue: 3 errors in empty disjunction:
_#useMergeQueue.jobs.merge_queue.needs: 2 errors in empty disjunction:
_#useMergeQueue.jobs.merge_queue.needs: conflicting values =~"^[_a-zA-Z][a-zA-Z0-9_-]*$" and [for k, _ in jobs if (k != "merge_queue") {k}] (mismatched types string and list):
./cue.mod/pkg/json.schemastore.org/github/github-workflow.cue:590:9
./workflows.cue:40:18
./workflows.cue:45:14
_#useMergeQueue.jobs.merge_queue.needs: incompatible list lengths (0 and 2)
Here's my repo for reference, although this is the working version where I manually specify the job names, but I'd like to replace this line in particular with the above:
https://github.com/EarthmanMuons/rustops-blueprint/blob/6a4379a647358e90fd06d86d54e8479e8e799849/.github/cue/workflows.cue#L42
Note: in JSON Schema itself, it's possible to address subschemas by their position in the JSON using a URL fragment containing a JSON Pointer. For example, {"$ref": "#/definitions/normalJob/properties/steps/items"}. Solving this issue by creating definitions for all nested schemas, probably with keys that are easily mapped to from JSON Pointer references, would make possible to support that kind of reference too.