spdx-examples icon indicating copy to clipboard operation
spdx-examples copied to clipboard

Add a Dataset Profile example (CO2 dataset)

Open bact opened this issue 1 year ago • 5 comments

Inspired from https://github.com/owid/co2-data/

Structurally and semantically validated against all tools here: https://github.com/spdx/spdx-3-model/blob/main/serialization/json_ld/validation.md

TODOs:

  • [x] Valid JSON-LD
  • [x] Parseable/make sense on https://json-ld.org/playground/
  • [x] Pass validation/generation of spdx3ToGraph
  • [x] Pass pyshacl
  • [x] Pass check-jsonschema
  • [x] Pass ajv

bact avatar Jun 04 '24 06:06 bact

Experiment notes:

  • Successfully Validated with all the tools (ajv, check-jsonschema, pyshacl) listed at https://github.com/spdx/spdx-3-model/blob/main/serialization/json_ld/validation.md
  • Warning messages of ajv and check-jsonschema sometimes are not very helpful
    • If there're warnings/errors, trying to remove some objects from your JSON. Once the smaller JSON got validated, gradually add few more.
    • spdx3ToGraph can be more handy to detect errors in the first runs as it can provide more useful error messages (use this https://github.com/maxhbr/spdx3ToGraph/pull/2 to get more exact location of error)
    • But spdx3ToGraph validation will not check the cardinality, you still have to use ajv or check-jsonschema for that. (The tool is meant primarily for visualization btw)
      • If maxCount is *, the data type must be an array
  • A lot of errors found in this try (and in few other examples) is about serialized names. So if TODO in https://github.com/spdx/spdx-spec/issues/975 is completed, it will help a lot.
  • A PlantUML diagram, generated from spdx3ToGraph can be useful to understand the overall structure
    • However, due to limitation of PlantUML visualizer (I use ones from PlantUML.com, online and offline), if you have a very long spdxId (based on UUIDv4, for example), your diagram are very likely to be overflowed/got cropped.
    • For this example, I edited the generated PlantUML file to have shorter spdxIds before I submit it again to the visualizer. Just to have a diagram that actually fit. (The IDs in JSON-LD file are untouched)
  • A real-time validation in an editor would help. VS Code supports JSON validation with a schema. If you familiar with VS Code, please help review this PR to add VS Code validation to the validation document https://github.com/spdx/spdx-3-model/pull/790

bact avatar Jun 16 '24 21:06 bact

This this how I put AnyLicneseInfo in this example, to please the SHACL validator -- as a workaround for the lacking of ListedLicense at this moment. (This will be removed once https://github.com/spdx/LicenseListPublisher/issues/183 is implemented).

LicenseExpression is a subclass of AnyLicenseInfo and is valid to be used a to in "has license" relationships. The spdxId is set to be identical to an expected license IRI. This means when the license (CC-BY-4.0) is available as ListedLicense (and use this IRI), this LicenseExpression workaround element can be removed without any need to make change in "has license" relationships.

    {
      "type": "simplelicensing_LicenseExpression",
      "spdxId": "https://spdx.org/licenses/CC-BY-4.0",
      "creationInfo": "_:creationinfo",
      "simplelicensing_licenseExpression": "CC-BY-4.0",
      "simplelicensing_licenseListVersion": "3.24.0"
    }

--

For this example, we can decide which version of BOM we would like to have:

  1. a BOM that is valid as of actual ontology (current ontology without ListedLicense)
  2. a BOM that is valid as of ontology as designed (future ontology with ListedLicense)

There can be 3 decision options:

a. If (1) is ok, we can merge as it is. And once https://github.com/spdx/LicenseListPublisher/issues/183 is implemented, we can revise the BOM again to remove the workaround LicenseExpression.

b. If (2) is preferred, I can remove the workaround LicenseExpression element now, so it can get merge (after other necessary revisions).

c. Last option is doing nothing until we have all the required ListedLicense and then go with (2).

bact avatar Jun 19 '24 06:06 bact

@rgopikrishnan91 this PR please Gopi

bact avatar Jun 19 '24 19:06 bact

@rgopikrishnan91 @bennetkl please kindly review. Thank you.

bact avatar Aug 01 '24 15:08 bact

@kestewart I believe you will have to merge, I don't have permission for this one.

bennetkl avatar Aug 01 '24 16:08 bennetkl

DIscussed in AI call. Merging.

kestewart avatar Aug 14 '24 21:08 kestewart