cps Improve JSON schema

Add a link to the JSON schema from the main documentation. Pretty-print said schema and add format and default, with the latter also moving to the "machine readable" block in the primary documentation. Add custom Pygments style to reflect the coloring used elsewhere in the documentation, and "fix" the language of the sample CPS file. Factor out some duplicated code. Fix some flake8 warnings.

May 10 '24 23:05 mwoehlke

FYI @autoantwort, this should fix a couple of the issues you raised. I'd appreciate if you can verify whether I did it right; thanks!

May 10 '24 23:05 mwoehlke

A suggestion, why not invert the generator to have rst be generated from json-schema (with sphinx-jsonschema). Reasoning is that the json schema can be much more powerful, e.g. checking sem-version, conditional dependencies, etc. which are less intuitive to define in the rst format. JSON format is of course rather cumbersome to navigate, so maybe having it in a yaml file would be better

Jul 02 '24 18:07 LecrisUT

why not invert the generator to have rst be generated from json-schema?

@LecrisUT, #25 did something similar (albeit without reST generation). The major reason is that reST is significantly more friendly to edit than a JSON schema. It's also unclear how well generating reST from something else allows reST markup to be used in the "something else" (even at best, editing is going to lost editor syntax highlighting), and generating reST also complicates building the documentation. The reality is that the reST is the source of truth and the JSON schema is supplemental and is intended primarily for use in machine validation.

Jul 19 '24 18:07 mwoehlke

The major reason is that reST is significantly more friendly to edit than a JSON schema.

A few pointers:

JSON schema does not have to be written in json, you can do it in yaml
json schema has a json schema that will help you write it
you can split a schema into components and either reference within the document or across documents ($ref, $definitions, stores)
a simple pytest module can be implemented (or a pre-commit, but that's more fragile) to regression test the schemas
cmake's schema has a quite neat design to handle version-specific schemas

It's also unclear how well generating reST from something else allows reST markup to be used in the "something else" (even at best, editing is going to lost editor syntax highlighting)

The schema objects that sphinx-jsonschema produces are refereanceable at least via :ref:. Syntax highlighting I haven't looked into it, it might require integration with sphinx to resolve dynamic .rst files. You do have tools like sphinx-autobuild that could help fill in the gaps.

and generating reST also complicates building the documentation

Initially, making the transition might take a lot of effort, but generating the rst and documentation is actually trivial. I did not dig into the current rst generation though but a recommended pattern is to expose components of the json schema one at a time, e.g.

.. jsonschema:: schema.json#/definitions/package

.. jsonschema:: schema.json#/definitions/components

PS: this would also make it easier to publish in https://github.com/SchemaStore/schemastore

Jul 19 '24 19:07 LecrisUT

@LecrisUT, let's consider an example:

.. cps:attribute:: clr_vendor
  :type: string
  :context: platform

  Specifies that the package's CLR (.NET) components
  require the specified `Common Language Runtime`_ vendor.
  Typical (case-insensitive) values include
  :string:`"microsoft"` and
  :string:`"mono"`.

How would that be written, without loss of expressiveness, in YAML?

It's trivial to migrate the attribute to a different object. It's also both trivial and very legible to write attributes that belong to more than one object. This isn't a hard requirement, but it's a strong nice-to-have. (Do attributes applying to more than one object get duplicated?)
It uses reST markup. Getting an editor to highlight that in a YAML file is a pain. Some markup (external references and especially substitutions) also require global definitions; how does jsonschema handle that?
What does markup look like in the schema .json? Does that just get the raw markup? (The existing system turns the reST into plain text.)
If some other documentation wants to reference this, what does that look like? What does it look like when objects have different attributes with the same name?
This example is a trivial type; how is a complex type like map(map(string)) specified? (The existing system generates compound type specifications as needed.)
How are "tiers" of schema handled? In the existing system, each tier is a separate file; they get merged in the generated schema.

Jul 19 '24 19:07 mwoehlke

@LecrisUT, let's consider an example:

There are 2 options, either use $$description which is specific to sphinx-jsonschema

#schema.json
definitions:
  clr_vendor:
    type: string
    description: ...fill short description for ide...
    $$description: |
      Specifies that the package's CLR (.NET) components
      require the specified `Common Language Runtime`_ vendor.
      Typical (case-insensitive) values include
      :string:`"microsoft"` and
      :string:`"mono"`.

or my preferred approach is to integrate the text in the rst document:

#schema.json
definitions:
  clr_vendor:
    type: string
    description: ...fill short description for ide...

Schemas.rst
===========

.. schema.json#/definitions/clr_vendor

  Specifies that the package's CLR (.NET) components
  require the specified `Common Language Runtime`_ vendor.
  Typical (case-insensitive) values include
  :string:`"microsoft"` and
  :string:`"mono"`.

Note: I am not sure of the indentation necessary here, definitely works without the indentation for the text. This one works by making each .. directive as a specific subsection so the text block is the same as writing it in other sections/admonations.

In this example I used definitions.clr_vendor, but it can as well delete those 2 levels and make it a single json-schema object.

It's trivial to migrate the attribute to a different object. It's also both trivial and very legible to write attributes that belong to more than one object. This isn't a hard requirement, but it's a strong nice-to-have. (Do attributes applying to more than one object get duplicated?)

With $ref, effectively it copies the content wherever the key $ref appears when the json-schema is parsed. But the original object is defined in only 1 place, and sphinx-jsonschema uses the $ref to use a link instead of copying the contents (there is an annoying limitation that $$target needs to be defined sometimes)

It uses reST markup. Getting an editor to highlight that in a YAML file is a pain. Some markup (external references and especially substitutions) also require global definitions; how does jsonschema handle that?

In the first case, indeed it would not work nice, but the second option should work smoothly. Only some :ref: would be harder to redirect.

What does markup look like in the schema .json? Does that just get the raw markup? (The existing system turns the reST into plain text.)

You could use plain rst markups, but ymmv when it comes to the IDE displaying it. My recommendation is the yaml section to be as compact as possible, since people will have the online documentation when needed. CMake adopts a similar approach.

If some other documentation wants to reference this, what does that look like? What does it look like when objects have different attributes with the same name?

Referencing an external json schema inside a json schema is technically possible, but I did not investigate it (it's the same $ref attribute). As for documentation, the targets generated by sphinx-jsonschema are available through cross-sphinx linkage.

This example is a trivial type; how is a complex type like map(map(string)) specified? (The existing system generates compound type specifications as needed.)

type: object
# If you want specific keys to be allowed only
properties:
  key:
    type: string
# Or a regex pattern
patternProperties:
  ".*":
    type: string
# Or take anything
additionalProperties:
  type: string

Reference: https://json-schema.org/understanding-json-schema/reference/object

How are "tiers" of schema handled? In the existing system, each tier is a separate file; they get merged in the generated schema.

Depends, you can either have 1 yaml file per object/rule or combine them and use definitions if you want to share definitions inside/across a file. Probably this part is the most relevant reference for this.

Jul 19 '24 20:07 LecrisUT

My recommendation is the yaml section to be as compact as possible, since people will have the online documentation when needed.

...but that's not unlike what I'm already doing; schema.rst is the "online documentation". The schema is a simplified version of that (which, frankly, is intended for machine-verification). The current system is easy to edit/maintain and is generally expressive enough to generate a schema that is intended to be used as a validation mechanism.

Aug 07 '24 21:08 mwoehlke

The current system is easy to edit/maintain and is generally expressive enough

I want to challenge this claim. The issue is that the generation goes from rst to json-schema in a custom generator. This means that new contributors wanting to dip their toes and start contributing will have to learn yet another markup, e.g. they would need to decipher what something like this means https://github.com/cps-org/cps/blob/2bfdc4b4fdb826a352629fd2d35faae50aa19c86/schema.rst?plain=1#L297-L298

So the quality of contributors can only go as far as the documentation made for contributors and the willingness for them to overcome.

On the other hand, there would always be people familiar with sphinx and json-schema who would be able to jump in and out to do random fixes as long as the structure is familiar to them.

...but that's not unlike what I'm already doing; schema.rst is the "online documentation"

There is a difference here. The proposal is to move all of the logic and metadata to a static json-schema document, and the rst document would only contain the detail text and markup. sphinx-json-schema will do the heavy lifting of populating the metadata

There are various benefits of such an approach:

reduced maintenance for custom implementations
familiarity for external contributors
integration with schemastore.org and immediate support to IDEs and other tooling
full support of json schema features:
- breaking down the schema into components
- conditionals, e.g. different schemas depending on cps_version
- regex matching, e.g. semver validation
- deprecation support
- and much more

Aug 07 '24 21:08 LecrisUT

This is now in the way of other changes that need to be made. Since it's been sitting a long while with only one relevant comment (which has been addressed weeks ago), I'm going to go ahead and merge it. Any further fixes needed will have to wait for their own PRs.

Aug 07 '24 21:08 mwoehlke

The proposal is to move all of the logic and metadata to a static json-schema document, and the rst document would only contain the detail text and markup.

What's "static"? "Unchanging"? Clearly that isn't the case, as some of your claimed advantages make clear. Nor am I convinced that splitting the schema into multiple formats is an improvement. I also am skeptical of your assertion that the current system is difficult to comprehend, especially if someone employs the simple step of comparing the source to the output.

That said, nothing is stopping someone from making a pull request. I definitely do not have time right now to make such sweeping changes.

Aug 08 '24 18:08 mwoehlke

Static as in static file. A file that is not generated.

Nor am I convinced that splitting the schema into multiple formats is an improvement.

I am not proposing that. There would only be 1 json schema in either a yaml or json format. Unfortunately that does mean the way to migrate is to fully migrate.

Regarding PR contribution, what I can write is the skeleton for this, but I would need to do a bunch more changes:

adopting RTD in order to run sphinx in PRs
reorganizing the project to move documentation part in docs and schemas in it's own folder
adding pre-commit
migrating the build to a proper PEP517 backend
add various builders like linkcheck and html -W and eliminating check-build-logs.py
discouraging the use of github pages

It is quite cumbersome to make each of these changes as individual PRs as they can be strongly dependent on one another, at most I could make them as individual commits. And this is all balancing on whether or not you are open for these changes and review so that I should know beforehand if I should go forward with prototyping these or not.

Aug 08 '24 19:08 LecrisUT

cps cps copied to clipboard

Improve JSON schema

cps
cps copied to clipboard