DCAT-AP icon indicating copy to clipboard operation
DCAT-AP copied to clipboard

Improve recommendation for how to mix downloadable files and Data Services in a Dataset Series

Open matthiaspalmer opened this issue 11 months ago • 16 comments

There are datasets that are best considered data series with downloadable files but that are also accessible via a data service. We see several alternatives to how to indicate the relation between the dataset series and the data service:

  1. Add an extra dataset in the series with a distribution that points to the data service.
  2. Add an extra distribution on every dataset in the series and point to the data service.
  3. Point from the data service to the data series via dcat:servesDataset.
  4. Add a distribution on the data series that points to the data service.

Alternative 1 is suboptimal as it will disturb any nice ordering of datasets in the series, e.g. if they correspond to yearly downloads there will be one more that breaks the pattern.

Alternative 2 provides many relations when you only need one. It will be messy and error prone for a portal to detect that they are all the same and provide a more high-level presentation that indicates that you can access the whole dataset series from a single data service.

Alternative 3 is doable, but it is not in line with how other datasets are expected to link to data services via a distribution (at least that is our reading, i.e. you are not expected to only provide a dcat:servesDataset without a distribution pointing in the other direction).

Alternative 4 seems most intuitive as it provides a data service for the whole dataset series.

We prefer alternative 4.

However, the following statement argues against alternative 4 in 14.1: "But the presence of these Distributions raise semantical conflicts such as whether the property of the Dataset Series frequency refers to the update frequency of the associated Distributions or the update frequency of the collection. To avoid these semantical conflicts, it is recommended not to associate distributions with a Dataset Series."

We do not think this statement is valid, W3C clearly states that the dcterms:accrualPeriodicity is supposed to be interpreted as the frequency of which new datasets are added. Hence, the semantics is clear, it will not refer to the update of the data. Sure, there will not be a way to talk about the update frequency of the data provided via the distributions of the Dataset series, but that does not matter if we only use the distribution for pointing to the dataset series.

matthiaspalmer avatar Sep 05 '23 13:09 matthiaspalmer