gobierto icon indicating copy to clipboard operation
gobierto copied to clipboard

Research how to integrate DCAT in Gobierto Data

Open amiedes opened this issue 5 years ago • 8 comments

https://www.w3.org/TR/vocab-dcat-2/

http://rml.io/

https://github.com/ruby-rdf/rdf-vocab

https://www.boe.es/diario_boe/txt.php?id=BOE-A-2013-2380

https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe

https://github.com/ckan/ckanext-dcat

Look at DCAT in:

  • Zaragoza
  • Irekia

amiedes avatar Nov 13 '19 11:11 amiedes

@amiedes: @entantoencuanto will be looking at some of these things this week.

furilo avatar Nov 18 '19 16:11 furilo

Issue updated with link to DCAT-AP in EU site https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe

furilo avatar Nov 21 '19 08:11 furilo

I've inspected the DCAT of datos.madrid.es. And I think we can generate similar data by adding some extra attributes to both custom fields and vocabularies terms. For example, a dataset appears in the catalog in this way:

<dct:identifier>···</dct:identifier>
<dct:title xml:lang="es">···</dct:title>
<dct:description xml:lang="es">···</dct:description>
<dcat:theme rdf:resource="http://···"/>
<dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">···</dct:issued>
<dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">···</dct:modified>
<dc:language>···</dc:language>
<dct:publisher rdf:resource="http://···"/>
<dct:license rdf:resource="https://···l"/>
<dcat:distribution>
  <dcat:Distribution>
    <dcat:accessURL rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI"></dcat:accessURL>
    <dcat:mediaType>···</dcat:mediaType>
    <dcat:byteSize>···</dcat:byteSize>
  </dcat:Distribution>
</dcat:distribution>

For the custom fields:

  • Each one can have an additional attribute rdf_decorator which knows how to represent the resource in DCAT format as string. For example, if the description internally is stored as:
{
    "es": "Parques Nacionales",
    "en": "National Parks"
}

once decorated this information can be included as:

<dct:title xml:lang="es">Parques Nacionales</dct:title>
<dct:title xml:lang="es">National Parks</dct:title>

For the vocabulary terms:

  • A vocabulary term should include an extra attribute with associated metadata. For example, if there is a custom field of type vocabulary named theme, internally is stored as:
{
    "theme": [1]
}

The 1 is the id of a vocabulary term which includes a meta:

{
    "rdf:resource": "http://datos.gob.es/kos/sector-publico/sector/medio-ambiente"
}

With a vocabulary decorator with source for the custom field the result would be:

<dcat:theme rdf:resource="http://datos.gob.es/kos/sector-publico/sector/medio-ambiente"/>

Other type of vocabulary fields may use different decorators with an output like this (in this case it's a vocabulary field with multiple selection allowed):

<dcat:keyword xml:lang="es">Medio Ambiente</dcat:keyword>
<dcat:keyword xml:lang="es">Impacto ambiental</dcat:keyword>

entantoencuanto avatar Feb 18 '20 18:02 entantoencuanto

== WIP ==

before to create a filled rdf dcat it is necessary map some values in any part of application.

Also I'd confirm my thought of a Catalog is dependant of a site (in any way) and a site only have a catalog

dcat:Catalog

values possibly related with a site:

attribute name example of value explanation
dct:title open dcat data catalog #{city}
dct:description open data catalog for #{city} with data into years 2019 until 2021 with formats ...
dct:identifier #465234646344
dct:issued site.created_at
dct:modified GobiertoData::Dataset.maximum(:updated_at)
dct:license link to license
dct:keyword stats create a new keyworks into dataset model
dct:keyword contract
dct:modified site.datasets.max(:updated_at)
dct:creator site.organization.name
dct:publisher site.organization.name
dct:contributor empty
dct:accrualPeriodicity (daily, what values fit here?) https://www.w3.org/TR/vocab-dcat-3/#temporal-properties
foaf:homepage some url
dcat:themeTaxonomy
dct:hasPart unused by us
dcat:dataset contain the dcat:Dataset
dcat:service
dcat:catalog ?
dcat:record ?

dcat:Dataset

of course there another associated to a dataset that probably should be added as custom fields

attribute name example of value comments
dct:identifier gobierto_data_datasets_url(id: slug)
dct:title
dct:description
dct:keyword can be multiples keywords
dct:issued
dct:modifed
dct:language
dct:license
dct:publisher site.organization.name
dct:distribution contain the 0+ dcat:Distribution

dcat:Distribution

a distribution belongs to dataset and it is a specific representation of a dataset like csv, xml ...

attribute name example of value
dct:identifier
dct:title
dct:description
dct:accessURL
dct:format application/csv

dcat:DataService (UNUSED BY NOW)

a data service: is a collection of operations through an interface (ex API) to access to one or more datasets

attribute name example of value
identifier

WIP

stbnrivas avatar Apr 26 '21 15:04 stbnrivas

For creator I'd just use site_name

furilo avatar Apr 26 '21 15:04 furilo

Looks good, let's complete this list today because most of the values are available from models and you can start implementing it.

On Mon, 26 Apr 2021 at 17:49, Álvaro Ortiz @.***> wrote:

For creator I'd just use site_name

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PopulateTools/gobierto/issues/2671#issuecomment-826946701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEJUCLY6VUKJSHLST5WZLTKWDR7ANCNFSM4JMZ7HOQ .

-- Fernando Blat @.*** +34 660825001

Populate / Tools for civic engagement https://populate.tools

Project stories twitter.com/populate_ & populate.tools/blog

ferblape avatar Apr 27 '21 04:04 ferblape

@stbnrivas please use https://www.itb.ec.europa.eu/shacl/dcat-ap/upload or other validator to validate the XML.

furilo avatar May 04 '21 07:05 furilo