DCAT-AP icon indicating copy to clipboard operation
DCAT-AP copied to clipboard

Provide more guidance on the relation between datasets and data services

Open matthiaspalmer opened this issue 11 months ago • 4 comments

We appreciate that chapter 14 provides guidance on when to use datasets and data services. For instance the statement "Datasets are the conceptual entity denoting a collection of data." is clarifying.

In our understanding of DCAT and the statement above, all data services that are not purely "data processing services" should be explicitly related to one or several datasets. There are also practical reasons for expressing relations between datasets and dataservices, e.g. a dataset may be available both as a download (via a distribution) and as a service.

The specification states how to use the properties dcat:servesDataset and dcat:accessService. However, it is not entirely straightforward to understand when to use them just by reading chapter 14. There are also other properties that should be used in a special way when relations are expressed, e.g. should there be a dcat:downloadURL when there is a dcat:accessService?

Hence, we would suggest to add the following guidance:

  1. Unless a data service is a pure "data processing service" at least one dataset in the same catalog should refer to it in a distribution via the property dcat:accessService.
  2. Pointing directly to a dataset from the data service via dcat:servesDataset is allowed but not necessary.
  3. If your data service provides data in multiple formats you should express that by repeating the dcterms:format on the dataservice. You may provide one distribution per format, but it is also acceptable to only provide one distribution with the format that is most widely used. (This is useful when you want to provide rich metadata on the distribution and maintaining many distributions only differing on the format with all other metadata fields repeated proves to be administratively challenging.)
  4. Distributions that refer to a data service via dcat:accessService should never provide a dcat:downloadURL.
  5. If your data service serves only one dataset the dcat:accessURL on the distribution and the dcat:endpointURL should be the same.
  6. If the data service serves many datasets the dcat:accessURL may be more specific than the dcat:endpointURL of the dataservice, but only if it corresponds to a way to filter the data so it corresponds to the dataset at hand.

matthiaspalmer avatar Sep 05 '23 10:09 matthiaspalmer