cubiql
cubiql copied to clipboard
Search datasets by dimension/attribute values
In this case we need to find datasets that have specific values at the dimensions or attributes.
Expected GraphQL queries:
{ datasets(data: {
or /and : [ { dimension:"http://statistics.gov.scot/def/dimension/populationGroup”
value:”http://statistics.gov.scot/def/concept/population-group/breastfed”}
{ dimension:"http://statistics.gov.scot/def/dimension/populationGroup”
value:”http://statistics.gov.scot/def/concept/population-group/children”} ]}){
title
}}
{ datasets(data: {
greater/smaller: [ { dimension:"http://purl.org/linked-data/sdmx/2009/dimension#refPeriod”
value:”http://reference.data.gov.uk/id/year/2015”} ]}) {
title
}}
{ datasets(data: {
and / or: [ { attribute:"http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure”
value:”http://statistics.gov.scot/def/concept/measure-units/pounds-gbp”}
{ attribute:"http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure”
value:”http://statistics.gov.scot/def/concept/measure-units/million-pounds-gbp”}]}){
title
}}
Required changes at schema:
:queries
{:datasets
{:type (list :dataset)
:resolve :resolve-datasets
:args
{:dimensions {:type :filter}
:data {:type :filter} --> add
:uri {:type :uri}}}}
{:filter
{:fields
{:or {:type (list :uri)
:description "List of URIs for which at least one must be contained within matching datasets."}
:and {:type (list :uri)
:description "List of URIs which must all be contained within matching datasets."}
:greater {:type (list :uri) -->add
:description "List of URIs which matching datasets must have greater values."} -->add
:smaller {:type (list :uri) -->add
:description "List of URIs which matching datasets must have smaller values."} -->add
}}
We may also need to modify the and/or operators at the schema to take as input dimension, value pairs.
At implementation level there are two options to search for the dimension/attribute values. I currently use option 2.
Option 1 Search at the dataset observations e.g.
select distinct ?ds where {
?obs qb:dataSet ?ds.
?ds a qb:DataSet.
?obs <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod>
<http://reference.data.gov.uk/id/year/2012>.
}
Pros: is generic and work with every data set even if no code lists are defined Cons: is slow, may lead to time out
Option 2 Search at the dataset structure. This option can be used at PublishMyData. e.g.
select distinct ?ds where {
?ds a qb:DataSet.
?ds qb:structure ?dsd.
?dsd qb:component ?comp.
?comp qb:dimension <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod>.
?comp qb:codeList ?cl.
?cl skos:member <http://reference.data.gov.uk/id/year/2012>.
}
Pros: is fast since it searches only at the structure. There is no need to iterate over all observations Cons: requires a code list that contains ONLY the values used at the dataset. Thus, separate code lists should be defined for different datasets.
The same options apply at "Search hierarchical data" #31