fusesoc icon indicating copy to clipboard operation
fusesoc copied to clipboard

create schema for `CAPI2` core format

Open proppy opened this issue 3 years ago • 12 comments

It would be nice to have a schema for defining constraint of the fields defining the CAPI2 core format.

Related: https://github.com/olofk/edalize/issues/288

proppy avatar Jan 19 '22 11:01 proppy

got https://gist.github.com/proppy/b2edb8ce0f209609dc6d90a2da6b9b9f#file-fusesoc-cue out of running https://github.com/wolverdude/GenSON on fusesoc_libraries/ + serv core files, and massage the output a little bit.

@olofk I'm curious if we should keep the tools section generic, or if you want it to specify all the options listed on https://fusesoc.readthedocs.io/en/latest/ref/capi2.html#tools ?

proppy avatar Jan 19 '22 13:01 proppy

Oh, cool! I think we need to keep the tools section generic actually because that information comes from Edalize and depends on which version of Edalize is being used. Of course, that makes it somewhat questionable to have in the FuseSoC documentation in the first place.

Another thing, are the cue files strictly a schema or can we put description in them as well? I'm thinking about what to use as the source definition. Would that be the cue file, or do we generate the cue file from something else? Currently, the source definition is this inline yaml thing. Would be good to extract all info from that and get into a more standardized format that can be used for both validation and documentation

olofk avatar Jan 19 '22 13:01 olofk

cue actually allows you to express:

  • constraints
  • data
  • transformations within the same .cue file.

A funny way to think of it, is that in cue 2 is just a very strong int constraint.

Which also means that core file could also be expressed as cue file if one wanted to get typechecking and generative capability for free.

oh and I didn't know "that inline yaml thing" existed, we should be able to derive part of it using cue expression:

Something like:

👺 cue eval -e "inline_yaml_thing" - << EOF
file: {
 file_type: string
 is_include_file: bool
 copyto: string
 include_path: string
}

inline_yaml_thing: File: {
  members: [for k,v in file {
    name: k
    type: v
  }]
}
EOF

output:

File: {
    members: [{
        name: "file_type"
        type: string
    }, {
        name: "is_include_file"
        type: bool
    }, {
        name: "copyto"
        type: string
    }, {
        name: "include_path"
        type: string
    }]
}

If we have the description as cue values we could manipulate from cue expressions and use https://cuelang.org/docs/integrations/yaml/ to output some yaml out of it if needed.

or if want to keep them as comments (way more readable) use the go api that give access to the ast https://cuelang.org/docs/integrations/go/ (which would also workaround not being to iterate on optional fields https://github.com/cue-lang/cue/issues/94).

a less involved option could be to export the cue schema as a standard json schema using https://cuelang.org/docs/integrations/openapi/, code comment should get mapped to property description, and then we can use something like https://sphinxcontrib-opendataservices-jsonschema.readthedocs.io/en/latest/ to render them in the documentation

and if we find it readable enough we could also just embed the cuelang file as it :)

proppy avatar Jan 19 '22 14:01 proppy

Most of this is new to me so I'm still trying to wrap my head around the various options, but here are some assorted thoughts going through my head.

  1. We don't need to output yaml. That inline yaml thing is what I'm hoping to replace
  2. Those jsonschema files look pretty nice. I guess there's a bit of tooling around those that would allow better validation of CAPI2 core description files than we currently have
  3. If it's easier to describe the spec as cue files which then generate jsonschema files, then I'm fine with using that instead. I'm just a bit worried if that means we need to pull in a go ecosystem as well for just this task

Does that make sense?

olofk avatar Jan 20 '22 22:01 olofk

Yep all makes sense.

  1. note the cue can also be used for validation, so jsonschema is not strictly needed for that
  2. one "reason" to export to jsonschema would be to get access to the documentation tooling that the cue ecosystem currently lack, but I we find the cue file readable enough maybe we don't need to generate an HTML table from it.

proppy avatar Jan 20 '22 22:01 proppy

I really want html generation like what we have here https://fusesoc.readthedocs.io/en/stable/ref/capi2.html, so that's definitely an argument for generating jsonschema files at least for now (or use those as the input format).

olofk avatar Jan 20 '22 23:01 olofk

@olofk that makes sense are you happy with what something like https://sphinx-jsonschema.readthedocs.io/en/latest/or https://sphinxcontrib-opendataservices-jsonschema.readthedocs.io/en/latest/ renders? https://sphinx-jsonschema.readthedocs.io/en/latest/schemakeywords.html#caption

or are you looking at matching the current rendering?

proppy avatar Jan 21 '22 00:01 proppy

@olofk also quickly imported serv core conf in cue and put together https://gist.github.com/proppy/fd26d30d23b6a9b4f37096da90f3f09c#file-serv-cue-L95 to give an idea of what kind of "transformation" can be done w/ cue expression:

If you call it with:

cue export --out yaml -e "(#Edalizer & {Core: Serv, Target: \"lint\"}).Edam" serv.cue

It gives you this:

name: serv
version: 1.1.0
top_level: serv_synth_wrapper
dependencies: ::serv:1.1.0
files:
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: data/verilator_waiver.vlt
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_bufreg.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_bufreg2.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_alu.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_csr.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_ctrl.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_decode.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_immdec.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_mem_if.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_rf_if.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_rf_ram_if.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_rf_ram.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_state.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_top.v
  - core: ::serv:1.1.0
    file_type: verilogSource
    name: rtl/serv_rf_top.v
tool_options:
  verilator:
    mode: lint-only
    verilator_options:
      - -Wall

Not that I think we'd need something like this (there is plently of corner cases in https://github.com/olofk/fusesoc/blob/master/fusesoc/edalizer.py that might be difficult to capture in the current state of cue), but I think I'd could be useful to explore what's the tooling is capable of, so that we can scope the right tool for the job.

proppy avatar Jan 21 '22 00:01 proppy

I haven't followed this too close, but have you guys looked into pydantic+json-schema-for-humans?

GlenNicholls avatar May 06 '22 00:05 GlenNicholls

Yep, this and dataclasses also got mentioned in https://github.com/olofk/edalize/issues/288.

That would be a good way to piggy back on the existing python tooling for doc generation.

proppy avatar May 11 '22 06:05 proppy

I don't have any strong opinions but jsonschema does seem to have the most mature ecosystem and I don't think we need anything more fancy than that. I have heard good things about dataclasses, but IIUC that's a Python thing and I would like to have the core description file tooling language-agnostic whenever possible so that e.g. JavaScript users can reuse it

olofk avatar May 31 '22 20:05 olofk

I tried for a while to mess with dataclasses for config files and they're pretty annoying because you have to implement all the validation/transformations in a more verbose way than pydantic. Not only this, but serializing config files isn't as easy as this with dataclass like pydantic supports:

class Something(BaseModel):
    ...

with open("cfg.file") as f:
  yml = serial blah stuff
data = Something(**yml)

It looks like datamodel-code-generator can generate pydantic models directly from JSON schema to actually validate core files at runtime. I'm not sure how good datamodel-code-generator is, but IMO, I think starting with pydantic basemodel as your code is already python makes sense because you can just export the model to JSON for anyone who wants the schema. I believe that pydantic also has a dataclass identical to the standard lib capable of validation as well.

EDIT: just wanted to provide some ideas as I spent more time than I'd like to admit to test different tools to figure out how to validate config files, document extensively how to write them for users, and also provide extensive documentation about the actual code that parses for devs.

GlenNicholls avatar Jun 22 '22 16:06 GlenNicholls

Actually, we have a jsonschema for capi2 now since a while back, so I'm closing this one. Next step would perhaps be to separate the schema from the fusesoc code to make it easier for other tools to use CAPI2

olofk avatar Dec 15 '23 17:12 olofk