science-on-schema.org icon indicating copy to clipboard operation
science-on-schema.org copied to clipboard

Need to determine which data set fields are mandatory, recommended or optional

Open rduerr opened this issue 5 years ago • 9 comments

As a community we need to determine which fields are necessary to support the various services we'd like to provide. For example, to find something you probably at least a name and identifier; but to access it you need end points, landing URL's and other such fields.

rduerr avatar Nov 20 '19 13:11 rduerr

Those kinds of recommendations could be in a profile (science on schema might be a profile...). There needs to be some kind of way to indicate the profile that a metadata document (sdo instance) conforms to, like dct:conformsTo, and a profile specification needs to declare a URI that is to be used to identify documents conforming to itself.

smrgeoinfo avatar Nov 25 '19 21:11 smrgeoinfo

Yeah, the BagIt Profile extension mechanism is similar, in that metadata in the Bag declares which BagIt profile it conforms to, and thereby validators can know how to check conformance. In our case, it seems like we would be conforming to a shacl shape that represents the science on schema profile, and so maybe we need a way to declare those in the instance doc, along with a well-known way to locate the shape definitions. Maybe @fils has already figured this out?

mbjones avatar Nov 25 '19 21:11 mbjones

To avoid confusion, we need to be clear if a URI identifies the profile or identifies the location of a resource that can be used to validate metadata instances for conformance with the profile. There might be multiple validation resources available.

the W3C DXWG profiles vocabulary draft models a 'resourceDescriptor' class to link a profile to associated resources like validation code (SWRL, SHACL, XSD, Schematron), text descriptions etc.

smrgeoinfo avatar Nov 25 '19 21:11 smrgeoinfo

Probably worth looking at DCAT-2 to crosscheck - summary class diagram There is also a crosswalk to schema.org

dr-shorthair avatar Nov 25 '19 22:11 dr-shorthair

Perhaps I am naive but couldn't this simply be a SHACL shape?

LDP defines such constraints (SHACL, Web Annotations) via http://www.w3.org/ns/ldp#constrainedBy

Currently Google defines their required and recommended in the Dev guide: https://developers.google.com/search/docs/data-types/dataset

We've converted these into shape graphs already at https://github.com/geoschemas-org/geoshapes/tree/master/shapegraphs

They do not consider @id to be required which I do disagree with.

Personally I'd love to see a constrainedBy attached to Thing in schema.org :)

fils avatar Nov 27 '19 01:11 fils

Yes - that's exactly what shapes are for. However, this does bring in the RDF-lens. People who primarily relate through the JSON surface-syntax might need some orientation.

dr-shorthair avatar Nov 27 '19 07:11 dr-shorthair

+1 for SHACL. The European Legislation Identifier (ELI) system is a good example of a community using SHACL shapes for promoting consistent content representation [1]. I expect a similar library of shapes could (should) be provided for this community to promote consistent data. DataONE has started using SHACL for testing SO:Dataset structure. So far it has worked well, though developing the shapes can be cumbersome. It would be awesome if there was a common library we could draw from (and contribute to).

[1] https://webgate.ec.europa.eu/eli-validator/home

datadavev avatar Dec 02 '19 21:12 datadavev

This issue has been automatically marked as stale because it has not had recent activity.

stale[bot] avatar Feb 01 '20 00:02 stale[bot]