cps
cps copied to clipboard
Improve JSON schema
Add a link to the JSON schema from the main documentation. Pretty-print said schema and add format
and default
, with the latter also moving to the "machine readable" block in the primary documentation. Add custom Pygments style to reflect the coloring used elsewhere in the documentation, and "fix" the language of the sample CPS file. Factor out some duplicated code. Fix some flake8
warnings.
FYI @autoantwort, this should fix a couple of the issues you raised. I'd appreciate if you can verify whether I did it right; thanks!
A suggestion, why not invert the generator to have rst
be generated from json-schema
(with sphinx-jsonschema
). Reasoning is that the json schema can be much more powerful, e.g. checking sem-version, conditional dependencies, etc. which are less intuitive to define in the rst format. JSON format is of course rather cumbersome to navigate, so maybe having it in a yaml file would be better
why not invert the generator to have
rst
be generated fromjson-schema
?
@LecrisUT, #25 did something similar (albeit without reST generation). The major reason is that reST is significantly more friendly to edit than a JSON schema. It's also unclear how well generating reST from something else allows reST markup to be used in the "something else" (even at best, editing is going to lost editor syntax highlighting), and generating reST also complicates building the documentation. The reality is that the reST is the source of truth and the JSON schema is supplemental and is intended primarily for use in machine validation.
The major reason is that reST is significantly more friendly to edit than a JSON schema.
A few pointers:
- JSON schema does not have to be written in json, you can do it in yaml
- json schema has a json schema that will help you write it
- you can split a schema into components and either reference within the document or across documents (
$ref
,$definitions
, stores) - a simple
pytest
module can be implemented (or apre-commit
, but that's more fragile) to regression test the schemas - cmake's schema has a quite neat design to handle version-specific schemas
It's also unclear how well generating reST from something else allows reST markup to be used in the "something else" (even at best, editing is going to lost editor syntax highlighting)
The schema objects that sphinx-jsonschema
produces are refereanceable at least via :ref:
. Syntax highlighting I haven't looked into it, it might require integration with sphinx to resolve dynamic .rst
files. You do have tools like sphinx-autobuild
that could help fill in the gaps.
and generating reST also complicates building the documentation
Initially, making the transition might take a lot of effort, but generating the rst and documentation is actually trivial. I did not dig into the current rst generation though but a recommended pattern is to expose components of the json schema one at a time, e.g.
.. jsonschema:: schema.json#/definitions/package
.. jsonschema:: schema.json#/definitions/components
PS: this would also make it easier to publish in https://github.com/SchemaStore/schemastore
@LecrisUT, let's consider an example:
.. cps:attribute:: clr_vendor
:type: string
:context: platform
Specifies that the package's CLR (.NET) components
require the specified `Common Language Runtime`_ vendor.
Typical (case-insensitive) values include
:string:`"microsoft"` and
:string:`"mono"`.
How would that be written, without loss of expressiveness, in YAML?
- It's trivial to migrate the attribute to a different object. It's also both trivial and very legible to write attributes that belong to more than one object. This isn't a hard requirement, but it's a strong nice-to-have. (Do attributes applying to more than one object get duplicated?)
- It uses reST markup. Getting an editor to highlight that in a YAML file is a pain. Some markup (external references and especially substitutions) also require global definitions; how does
jsonschema
handle that? - What does markup look like in the schema
.json
? Does that just get the raw markup? (The existing system turns the reST into plain text.) - If some other documentation wants to reference this, what does that look like? What does it look like when objects have different attributes with the same name?
- This example is a trivial type; how is a complex type like
map(map(string))
specified? (The existing system generates compound type specifications as needed.) - How are "tiers" of schema handled? In the existing system, each tier is a separate file; they get merged in the generated schema.
@LecrisUT, let's consider an example:
There are 2 options, either use $$description
which is specific to sphinx-jsonschema
#schema.json
definitions:
clr_vendor:
type: string
description: ...fill short description for ide...
$$description: |
Specifies that the package's CLR (.NET) components
require the specified `Common Language Runtime`_ vendor.
Typical (case-insensitive) values include
:string:`"microsoft"` and
:string:`"mono"`.
or my preferred approach is to integrate the text in the rst document:
#schema.json
definitions:
clr_vendor:
type: string
description: ...fill short description for ide...
Schemas.rst
===========
.. schema.json#/definitions/clr_vendor
Specifies that the package's CLR (.NET) components
require the specified `Common Language Runtime`_ vendor.
Typical (case-insensitive) values include
:string:`"microsoft"` and
:string:`"mono"`.
Note: I am not sure of the indentation necessary here, definitely works without the indentation for the text. This one works by making each ..
directive as a specific subsection so the text block is the same as writing it in other sections/admonations.
In this example I used definitions.clr_vendor
, but it can as well delete those 2 levels and make it a single json-schema object.
- It's trivial to migrate the attribute to a different object. It's also both trivial and very legible to write attributes that belong to more than one object. This isn't a hard requirement, but it's a strong nice-to-have. (Do attributes applying to more than one object get duplicated?)
With $ref
, effectively it copies the content wherever the key $ref
appears when the json-schema is parsed. But the original object is defined in only 1 place, and sphinx-jsonschema uses the $ref
to use a link instead of copying the contents (there is an annoying limitation that $$target
needs to be defined sometimes)
- It uses reST markup. Getting an editor to highlight that in a YAML file is a pain. Some markup (external references and especially substitutions) also require global definitions; how does
jsonschema
handle that?
In the first case, indeed it would not work nice, but the second option should work smoothly. Only some :ref:
would be harder to redirect.
- What does markup look like in the schema
.json
? Does that just get the raw markup? (The existing system turns the reST into plain text.)
You could use plain rst markups, but ymmv when it comes to the IDE displaying it. My recommendation is the yaml section to be as compact as possible, since people will have the online documentation when needed. CMake adopts a similar approach.
- If some other documentation wants to reference this, what does that look like? What does it look like when objects have different attributes with the same name?
Referencing an external json schema inside a json schema is technically possible, but I did not investigate it (it's the same $ref
attribute). As for documentation, the targets generated by sphinx-jsonschema
are available through cross-sphinx linkage.
- This example is a trivial type; how is a complex type like
map(map(string))
specified? (The existing system generates compound type specifications as needed.)
type: object
# If you want specific keys to be allowed only
properties:
key:
type: string
# Or a regex pattern
patternProperties:
".*":
type: string
# Or take anything
additionalProperties:
type: string
Reference: https://json-schema.org/understanding-json-schema/reference/object
- How are "tiers" of schema handled? In the existing system, each tier is a separate file; they get merged in the generated schema.
Depends, you can either have 1 yaml file per object/rule or combine them and use definitions
if you want to share definitions inside/across a file. Probably this part is the most relevant reference for this.
My recommendation is the yaml section to be as compact as possible, since people will have the online documentation when needed.
...but that's not unlike what I'm already doing; schema.rst
is the "online documentation". The schema is a simplified version of that (which, frankly, is intended for machine-verification). The current system is easy to edit/maintain and is generally expressive enough to generate a schema that is intended to be used as a validation mechanism.
The current system is easy to edit/maintain and is generally expressive enough
I want to challenge this claim. The issue is that the generation goes from rst to json-schema in a custom generator. This means that new contributors wanting to dip their toes and start contributing will have to learn yet another markup, e.g. they would need to decipher what something like this means https://github.com/cps-org/cps/blob/2bfdc4b4fdb826a352629fd2d35faae50aa19c86/schema.rst?plain=1#L297-L298
So the quality of contributors can only go as far as the documentation made for contributors and the willingness for them to overcome.
On the other hand, there would always be people familiar with sphinx and json-schema who would be able to jump in and out to do random fixes as long as the structure is familiar to them.
...but that's not unlike what I'm already doing;
schema.rst
is the "online documentation"
There is a difference here. The proposal is to move all of the logic and metadata to a static json-schema document, and the rst document would only contain the detail text and markup. sphinx-json-schema
will do the heavy lifting of populating the metadata
There are various benefits of such an approach:
- reduced maintenance for custom implementations
- familiarity for external contributors
- integration with schemastore.org and immediate support to IDEs and other tooling
- full support of json schema features:
- breaking down the schema into components
- conditionals, e.g. different schemas depending on
cps_version
- regex matching, e.g. semver validation
- deprecation support
- and much more
This is now in the way of other changes that need to be made. Since it's been sitting a long while with only one relevant comment (which has been addressed weeks ago), I'm going to go ahead and merge it. Any further fixes needed will have to wait for their own PRs.
The proposal is to move all of the logic and metadata to a static json-schema document, and the rst document would only contain the detail text and markup.
What's "static"? "Unchanging"? Clearly that isn't the case, as some of your claimed advantages make clear. Nor am I convinced that splitting the schema into multiple formats is an improvement. I also am skeptical of your assertion that the current system is difficult to comprehend, especially if someone employs the simple step of comparing the source to the output.
That said, nothing is stopping someone from making a pull request. I definitely do not have time right now to make such sweeping changes.
Static as in static file. A file that is not generated.
Nor am I convinced that splitting the schema into multiple formats is an improvement.
I am not proposing that. There would only be 1 json schema in either a yaml or json format. Unfortunately that does mean the way to migrate is to fully migrate.
Regarding PR contribution, what I can write is the skeleton for this, but I would need to do a bunch more changes:
- adopting RTD in order to run sphinx in PRs
- reorganizing the project to move documentation part in
docs
and schemas in it's own folder - adding
pre-commit
- migrating the build to a proper PEP517 backend
- add various builders like
linkcheck
andhtml -W
and eliminatingcheck-build-logs.py
- discouraging the use of github pages
It is quite cumbersome to make each of these changes as individual PRs as they can be strongly dependent on one another, at most I could make them as individual commits. And this is all balancing on whether or not you are open for these changes and review so that I should know beforehand if I should go forward with prototyping these or not.