science-on-schema.org
science-on-schema.org copied to clipboard
Need to determine which data set fields are mandatory, recommended or optional
As a community we need to determine which fields are necessary to support the various services we'd like to provide. For example, to find something you probably at least a name and identifier; but to access it you need end points, landing URL's and other such fields.
Those kinds of recommendations could be in a profile (science on schema might be a profile...). There needs to be some kind of way to indicate the profile that a metadata document (sdo instance) conforms to, like dct:conformsTo, and a profile specification needs to declare a URI that is to be used to identify documents conforming to itself.
Yeah, the BagIt Profile extension mechanism is similar, in that metadata in the Bag declares which BagIt profile it conforms to, and thereby validators can know how to check conformance. In our case, it seems like we would be conforming to a shacl shape that represents the science on schema profile, and so maybe we need a way to declare those in the instance doc, along with a well-known way to locate the shape definitions. Maybe @fils has already figured this out?
To avoid confusion, we need to be clear if a URI identifies the profile or identifies the location of a resource that can be used to validate metadata instances for conformance with the profile. There might be multiple validation resources available.
the W3C DXWG profiles vocabulary draft models a 'resourceDescriptor' class to link a profile to associated resources like validation code (SWRL, SHACL, XSD, Schematron), text descriptions etc.
Probably worth looking at DCAT-2 to crosscheck - summary class diagram There is also a crosswalk to schema.org
Perhaps I am naive but couldn't this simply be a SHACL shape?
LDP defines such constraints (SHACL, Web Annotations) via http://www.w3.org/ns/ldp#constrainedBy
Currently Google defines their required and recommended in the Dev guide: https://developers.google.com/search/docs/data-types/dataset
We've converted these into shape graphs already at https://github.com/geoschemas-org/geoshapes/tree/master/shapegraphs
They do not consider @id to be required which I do disagree with.
Personally I'd love to see a constrainedBy attached to Thing in schema.org :)
Yes - that's exactly what shapes are for. However, this does bring in the RDF-lens. People who primarily relate through the JSON surface-syntax might need some orientation.
+1 for SHACL. The European Legislation Identifier (ELI) system is a good example of a community using SHACL shapes for promoting consistent content representation [1]. I expect a similar library of shapes could (should) be provided for this community to promote consistent data. DataONE has started using SHACL for testing SO:Dataset
structure. So far it has worked well, though developing the shapes can be cumbersome. It would be awesome if there was a common library we could draw from (and contribute to).
[1] https://webgate.ec.europa.eu/eli-validator/home
This issue has been automatically marked as stale because it has not had recent activity.