gobierto
gobierto copied to clipboard
Research how to integrate DCAT in Gobierto Data
https://www.w3.org/TR/vocab-dcat-2/
http://rml.io/
https://github.com/ruby-rdf/rdf-vocab
https://www.boe.es/diario_boe/txt.php?id=BOE-A-2013-2380
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
https://github.com/ckan/ckanext-dcat
Look at DCAT in:
- Zaragoza
- Irekia
@amiedes: @entantoencuanto will be looking at some of these things this week.
Issue updated with link to DCAT-AP in EU site https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
I've inspected the DCAT of datos.madrid.es. And I think we can generate similar data by adding some extra attributes to both custom fields and vocabularies terms. For example, a dataset appears in the catalog in this way:
<dct:identifier>···</dct:identifier>
<dct:title xml:lang="es">···</dct:title>
<dct:description xml:lang="es">···</dct:description>
<dcat:theme rdf:resource="http://···"/>
<dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">···</dct:issued>
<dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">···</dct:modified>
<dc:language>···</dc:language>
<dct:publisher rdf:resource="http://···"/>
<dct:license rdf:resource="https://···l"/>
<dcat:distribution>
<dcat:Distribution>
<dcat:accessURL rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI"></dcat:accessURL>
<dcat:mediaType>···</dcat:mediaType>
<dcat:byteSize>···</dcat:byteSize>
</dcat:Distribution>
</dcat:distribution>
For the custom fields:
- Each one can have an additional attribute
rdf_decorator
which knows how to represent the resource in DCAT format as string. For example, if the description internally is stored as:
{
"es": "Parques Nacionales",
"en": "National Parks"
}
once decorated this information can be included as:
<dct:title xml:lang="es">Parques Nacionales</dct:title>
<dct:title xml:lang="es">National Parks</dct:title>
For the vocabulary terms:
- A vocabulary term should include an extra attribute with associated metadata. For example, if there is a custom field of type vocabulary named theme, internally is stored as:
{
"theme": [1]
}
The 1 is the id of a vocabulary term which includes a meta:
{
"rdf:resource": "http://datos.gob.es/kos/sector-publico/sector/medio-ambiente"
}
With a vocabulary decorator with source for the custom field the result would be:
<dcat:theme rdf:resource="http://datos.gob.es/kos/sector-publico/sector/medio-ambiente"/>
Other type of vocabulary fields may use different decorators with an output like this (in this case it's a vocabulary field with multiple selection allowed):
<dcat:keyword xml:lang="es">Medio Ambiente</dcat:keyword>
<dcat:keyword xml:lang="es">Impacto ambiental</dcat:keyword>
== WIP ==
before to create a filled rdf dcat it is necessary map some values in any part of application.
Also I'd confirm my thought of a Catalog is dependant of a site (in any way) and a site only have a catalog
dcat:Catalog
values possibly related with a site:
attribute name | example of value | explanation |
---|---|---|
dct:title | open dcat data catalog #{city} | |
dct:description | open data catalog for #{city} with data into years 2019 until 2021 with formats ... | |
dct:identifier | #465234646344 | |
dct:issued | site.created_at |
|
dct:modified | GobiertoData::Dataset.maximum(:updated_at) |
|
dct:license | link to license | |
dct:keyword | stats | create a new keyworks into dataset model |
dct:keyword | contract | |
dct:modified | site.datasets.max(:updated_at) |
|
dct:creator | site.organization.name | |
dct:publisher | site.organization.name | |
dct:contributor | empty | |
dct:accrualPeriodicity | (daily, what values fit here?) | https://www.w3.org/TR/vocab-dcat-3/#temporal-properties |
foaf:homepage | some url | |
dcat:themeTaxonomy | ||
dct:hasPart | unused by us | |
dcat:dataset | contain the dcat:Dataset |
|
dcat:service | ||
dcat:catalog | ? | |
dcat:record | ? |
dcat:Dataset
of course there another associated to a dataset that probably should be added as custom fields
attribute name | example of value | comments |
---|---|---|
dct:identifier | gobierto_data_datasets_url(id: slug) |
|
dct:title | ||
dct:description | ||
dct:keyword | can be multiples keywords | |
dct:issued | ||
dct:modifed | ||
dct:language | ||
dct:license | ||
dct:publisher | site.organization.name | |
dct:distribution | contain the 0+ dcat:Distribution |
dcat:Distribution
a distribution belongs to dataset and it is a specific representation of a dataset like csv, xml ...
attribute name | example of value | |
---|---|---|
dct:identifier | ||
dct:title | ||
dct:description | ||
dct:accessURL | ||
dct:format | application/csv |
dcat:DataService (UNUSED BY NOW)
a data service: is a collection of operations through an interface (ex API) to access to one or more datasets
attribute name | example of value | |
---|---|---|
identifier |
WIP
For creator I'd just use site_name
Looks good, let's complete this list today because most of the values are available from models and you can start implementing it.
On Mon, 26 Apr 2021 at 17:49, Álvaro Ortiz @.***> wrote:
For creator I'd just use site_name
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PopulateTools/gobierto/issues/2671#issuecomment-826946701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEJUCLY6VUKJSHLST5WZLTKWDR7ANCNFSM4JMZ7HOQ .
-- Fernando Blat @.*** +34 660825001
Populate / Tools for civic engagement https://populate.tools
Project stories twitter.com/populate_ & populate.tools/blog
@stbnrivas please use https://www.itb.ec.europa.eu/shacl/dcat-ap/upload or other validator to validate the XML.