InformationModel icon indicating copy to clipboard operation
InformationModel copied to clipboard

Achieve compatibility with DCAT 2

Open clange opened this issue 6 years ago • 16 comments

The W3C Data Exchange Working Group (https://www.w3.org/2017/dxwg/) is working on DCAT 2. DCAT 2 made it to the Candidate Recommendation stage on 3 Oct 2019 (https://www.w3.org/TR/vocab-dcat-2/). Considering the

we should make sure that our information model is compatible with DCAT 2.

Some relevant features of DCAT 2 include:

  • more expressive description of temporal and spatial extent of a data resource (e.g., DCAT itself now implements what we modelled on our own using ids:temporalCoverage and ids:begin etc.
  • vocabulary to talk about relations of a resource to another resource (e.g. its original version)
  • a way to talk about the data quality of a resource, in terms of the W3C Data Quality Vocabulary

clange avatar Oct 10 '19 12:10 clange

Additions: We need to verify that...

  • state that the IDS infomodel conforms to DCAT (via PROF?)
  • all usages are indeed compatible with DCAT
  • both domains and ranges of all our classes are consistent w/ DCAT
  • (more roughly, because of the current status quo) reasoning and axioms are correct
  • if we use the same semantics, then we should add "equivalent" statements for clarification
  • double-check correct usage of import and copy; be consistent with explicit and implicit cases
  • semantic meaning to the presence of these axioms - they identify a subset recommended for use. Its general stronger to use formal constraints with something like SHACL to make statements about mandatory presence of properties in instance data, without declaring axioms on the underlying class itself (a frame based profile making a commitment about what data is accessible - as opposed to the open world assumption in OWL)
  • (possibly) study DCAT-AP and decide if we want to use it and how

JohannesLipp avatar May 12 '20 12:05 JohannesLipp

My first investigation is on where we currently use the dcat prefix. This is in 19 files (open tasks marked in bold):

  • Host
    • Status: Declaring the prefix, but not actually using it
    • Action: Remove unused prefix .
  • Artifact
    • Status: Using dcat:byteSize, which is fully compatible with DCAT-2.
    • Action: None
  • DigitalContent
    • Status: Using dcat:Dataset, dcat:theme, and dcat:keyword
    • Action: None (cf. the first bullet point in the comment below)
  • Representation
    • Status: Using dcat:Distribution, dcat:mediaType
    • Action: None
  • Resource
    • Status: ids:Resource rdfs:seeAlso dcat:Dataset, and is logically based on it
    • Action: None
  • VocabularyData
    • Status: VocabularyData rdfs:seeAlso dcat:Dataset
    • Action: None .
  • Standard
    • Status: Declaring the prefix, but not actually using it
    • Action: Remove unused prefix .
  • adms-20130801
  • csvw-20170523
  • dcat-20140116 .
  • Catalog
    • Status: Extending dcat:Catalog as subClass
    • Action: None (see comment below)
  • Connector
    • Status: Declaring the prefix, but not actually using it
    • Action: Remove unused prefix .
  • HostShape, ArtifactShape, DigitalContentShape, RepresentationShape, ResourceShape, CatalogShape, and ConnectorShape
    • Status: Declaring the prefix, but not actually using it
    • Action: Remove unused prefix

Please note that for comparison, I refer to the DCAT-1 and DCAT-2 Turtle files [1,2] as well as their documentations [3,4].

[1] https://www.w3.org/ns/dcat2014.ttl [2] https://www.w3.org/ns/dcat2.ttl [3] https://www.w3.org/TR/2014/REC-vocab-dcat-20140116/ [4] https://www.w3.org/TR/vocab-dcat-2/

JohannesLipp avatar Jun 05 '20 07:06 JohannesLipp

Detailed investigations and explanations:

  • dcat:theme: The range skos:Concept is the same for DCAT-2 and DCAT-1. DCAT-2 does not define the domain dcat:Dataset anymore, but still inherits the domain dct:subject from its super property. We define in ids:theme the domain ids:DigitalContent, which is a subclass of dcat:Dataset and a range ids:Concept, which is a subclass of skos:Concept. Decision via video conference: Because dcat:Dataset still matches our ids:DigitalContent best, we will not update it to extend dcat:Resource.
  • dcat:keyword: Same change as with dcat:theme - DCAT-2 refers to describing resources instead of datasets. No action needed, because dataset extends resource.
  • dcat:mediaType: Domain remains dcat:Distribution, but the range changed from MediaTypeOrExtent to the more restrictive dct:MediaType. If no IANA media can be referred, they still suggest to use dct:format instead. Seems like we do not take any actions here.
  • dcat:Dataset: DCAT-2 introduces the new class dcat:Resource, which is the "class of all cataloged resources" and a superclass of dcat:Dataset.
  • ids:Resource extends ids:DigitalContent, which again extends dcat:Dataset. Following the decision in the first bullet point, we will stick with this model.
  • dcat:Distribution and dcat:distribution: There is no major change in DCAT-2, and it still is dcat:Dataset-->dcat:Distribution. Due to materialization aspects, we do not properly extend this class in ids:representation and ids:Artifact. This is because we also have a more abstract resource (Representation as well as individual materialization (Artifact).
  • dcat:Catalog Changed in DCAT-2 from "a collection of metadata about datasets" to "[...] about resources". Please note an issue in NOTES.md. ids:Catalog is a subclass of dcat:Catalog. No action required, because dataset is a subclass of resource and we follow our decision from the first bullet point.

JohannesLipp avatar Jun 05 '20 09:06 JohannesLipp

Coming to the major changes from DCAT-1.1 to DCAT-2, which are of interest for us:

  • Space: Using dct:spatial to point to a dct:Location, which has three new properties locn:geometry, dcat:bbox, dcat:centroid that can also be used concurrently. Our ids:spatialCoverage already extends dct:spatial, which does not yet extend dct:Location Suggested action: Make it extend that
  • Time: A new class dct:PeriodOftime supports the temporal coverage of a resource via dcat:startDate, dcat:endDate, time:hasBeginning, and time:hasEnd Suggested action: Use it.
  • Time (in general): DCAT-2 suggest five temporal properties:
    • dct:issued Status: Extended in Message/ids:issued and DigitalContent/ids:created Action: Add xsd:date to the domain of ids:created
    • dct:modified Status: Extended in ids:created in DigitalContent Action: Add xsd:date to the domain
    • dct:accrualPeriodicity Status: Currently not used, but there is ids:accrualPeriodicity in DigitalContent Action: Make it extend dct:accrualPeriodicity and cascade the changes, which particularly include the usage of the dct:Frequency domain - in order to align it with DCAT-2.
    • dcat:temporalResolution Status: Currently not used. DCAT-2 uses this to define a minimum time period resolvable in a dataset. Action: Add to either DigitalContent (cf. example 19 and usage notes)
    • dct:temporal Status: ids:temporalCoverage in DigitalContent extends it. Action: Make ids:Interval extend dct:PeriodOfTime cf. DCAT-2 examples

JohannesLipp avatar Jun 09 '20 10:06 JohannesLipp

The work for DCAT-2 is done. Compatibility with DCAT-AP is done in issue #277

JohannesLipp avatar Jun 19 '20 09:06 JohannesLipp

@JohannesLipp currently reviewing your investigations. Re ids:VocabularyData I would suggest (could you please do it if it's not yet done?) that we open a separate issue for getting rid of that class? It was a workaround for adding some of the domain-specific structure/semantics features at a time when CodeGen was not yet able to handle terms from non-IDS namespaces.

clange avatar Jul 22 '20 11:07 clange

After reviewing, the following questions remain to be asked to DCAT experts.

  • Is it OK to have ids:theme rdfs:subPropertyOf dcat:theme; rdfs:domain [ rdfs:subClassOf dcat:Dataset ]? I think it is, because we are talking about a more specific property, and that property is not mandatory.

clange avatar Jul 22 '20 11:07 clange

Re. dcat:mediaType I think we shall take the right decision in the context of our ongoing discussion on how to replace our media type code lists by something more lightweight that can take any standard or non-standard string of the form "type/subtype". @JohannesLipp could you please link to that issue from here, or in any case make sure we have an issue for that discussion? (The discussion would be similar to #296.) My input to that discussion is that I think we should not represent media types simply as string literals but indeed continue to represent them as instances of ids:MediaType, but make sure that additional types can be used easily: the most lightweight representation would be ex:MyDataResource ids:mediaType [ rdfs:label "foo/bar" ], such that the blank node would implicitly be of type ids:MediaType and thus also of dct:MediaType. It does make sense to remain compatible with dct:MediaType, and the good thing is that its specification is so vague that it doesn't restrict us to anything other than modelling media types as resources.

clange avatar Jul 22 '20 11:07 clange

@JohannesLipp in https://github.com/International-Data-Spaces-Association/InformationModel/pull/270 (How do you easily/directly link to a pull request?) I did not see anything about the first bullet point in the comment about dct:spatial. Did you also cover that?

clange avatar Jul 22 '20 11:07 clange

This comment is a placeholder for some more DCAT2 features I'd like to request to be supported by the IDS infomodel. At the very least we should go through the full list of changes from DCAT 1 to 2 once more. I think at least dcat:DataService is related to ids:Endpoint in a way that we have not yet considered here (see https://www.w3.org/TR/vocab-dcat-2/#Class:Data_Service), and there may be further terms.

clange avatar Jul 22 '20 11:07 clange

After reviewing, the following questions remain to be asked to DCAT experts.

  • Is it OK to have ids:theme rdfs:subPropertyOf dcat:theme; rdfs:domain [ rdfs:subClassOf dcat:Dataset ]? I think it is, because we are talking about a more specific property, and that property is not mandatory.

I would say yes. Currently, the domain is ids:DigitalContent, which is a subclass of dcat:Dataset. Your suggestion using a blank node would therefore replace the range ids:DigitalContent with the more generalize one "anything extending dcat:Dataset

JohannesLipp avatar Jul 22 '20 14:07 JohannesLipp

Re. dcat:mediaType I think we shall take the right decision in the context of our ongoing discussion on how to replace our media type code lists by something more lightweight that can take any standard or non-standard string of the form "type/subtype". @JohannesLipp could you please link to that issue from here, or in any case make sure we have an issue for that discussion? (The discussion would be similar to #296.) My input to that discussion is that I think we should not represent media types simply as string literals but indeed continue to represent them as instances of ids:MediaType, but make sure that additional types can be used easily: the most lightweight representation would be ex:MyDataResource ids:mediaType [ rdfs:label "foo/bar" ], such that the blank node would implicitly be of type ids:MediaType and thus also of dct:MediaType. It does make sense to remain compatible with dct:MediaType, and the good thing is that its specification is so vague that it doesn't restrict us to anything other than modelling media types as resources.

IMHO there is no action needed from our side. dcat:mediaType has range dct:MediaType, and ids:mediaType and ids:MediaType extend these, respectively. We discussed this in #224 and the compact result (following DCAT2 is the following:

:Foo ids:mediaType <http://www.iana.org/assignments/media-types/text/csv> ;

JohannesLipp avatar Jul 22 '20 14:07 JohannesLipp

@JohannesLipp in #270 (How do you easily/directly link to a pull request?) I did not see anything about the first bullet point in the comment about dct:spatial. Did you also cover that?

You just did that direct link to a pull request in that comment 😃 Thank you for the info, I have not covered that indeed. I solved it via the most recent commit, which we agreed on in today's Infomodel call.

JohannesLipp avatar Jul 23 '20 14:07 JohannesLipp

@JohannesLipp in investigating the reuse of the IDS infomodel for the Agricultural Information Model of https://h2020-demeter.eu/, where DCAT was given as the baseline, I identified the following missing points:

  • I did actually see DigitalContent temporalResolution Frequency in IDS. It would be good to align this with DCAT's temporalResolution.
  • Interestingly, DCAT's spatialResolutionInMeters is not reflected in IDS.
  • Also I think my earlier comment on thinking about the relation between dcat:DataService and ids:Endpoint got lost.

clange avatar Feb 04 '21 00:02 clange

  • I did actually see DigitalContent temporalResolution Frequency in IDS. It would be good to align this with DCAT's temporalResolution. Yes, it is:
ids:temporalResolution
    rdfs:subPropertyOf dcat:temporalResolution ;

cf. https://github.com/International-Data-Spaces-Association/InformationModel/blob/develop/model/content/DigitalContent.ttl#L160

ids:spatialCoverage extends dct:spatial. We however do not use this particular resolution in meters yet.

  • Also I think my earlier comment on thinking about the relation between dcat:DataService and ids:Endpoint got lost.

JohannesLipp avatar Feb 04 '21 14:02 JohannesLipp

Related to #593

lcomet avatar Dec 15 '23 16:12 lcomet