cubiql
cubiql copied to clipboard
Add queries for validating data against CubiQL's expectations.
Issue #127 - Add validation queries for use with rdf-validator which checks a data source against the requirements CubiQL makes on data cubes.
The queries are templates which expect the relevant configuration values to be provided when executed.
Add instructions to the README on running these validations.
@zeginis - Please could you try running these validation queries against your data and let me know if you have any problems?
@lkitching I checked the PR. The queries are ok. I have tested them on some data created by Table2qb and they pass the tests.
However I think we need some more tests to cover other CubiQL requirements:
- There is a code list that contains ONLY the concepts used at the cube
- There is a code list for each qb:DimensionProperty including qb:MeasureType
- Maybe we also need a test for the language tags (e.g. @en, nil)
What do you think?
@lkitching I added 2 new SPARQL queries to support the CubiQL requirements I mentioned.
When I run them independently at the SPARQL endpoint they return no results -> they succeed But when I use them at the validator they fail. Any idea why this happens?
The config I use:
{:geo-dimension-uri nil
:time-dimension-uri nil
:codelist-source "http://purl.org/linked-data/cube#ComponentSpecification"
:codelist-predicate "http://publishmydata.com/def/qb/codesUsed"
:codelist-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
:dataset-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
:schema-label-language nil
:max-observations-page-size 2000}
I run the validator at: http://195.251.218.39:8893/sparql
@zeginis - Dimensions no longer need to specify a codelist - any dimensions which do not specify one and which are not ref area, ref period, string or decimal types are mapped to a String
type in the schema and are submitted as typed literals within generated SPARQL queries.
@zeginis - I've pushed a fix to the new queries to allow them to run as expected in rdf-validator
. Comments currently need to go after the query so sesame infers the correct query type. This is effectively a bug in rdf-validator
we need to fix.
@lkitching what do you meant they are mapped to a String
?
Can we use such dimensions without codelist to lock dimensions?
e.g.
{cubiql{
dataset_earnings {
title
description
observations(dimensions:{gender:ALL
population_group:WORKPLACE_BASED
measure_type:MEDIAN}) {
total_matches
}}}}
@lkitching I tried using a dimension that has values URIs but there is no codelist defined. CubiQL is not working properly:
- when requesting the dimension values I get an empty list
- I get an "Internal server error: exception" when requesting the observations.
You can try at the endpoint: http://195.251.218.39:8893/sparql
The dimension http://example.gr/hello/def/dimension/station_id
has no codelist.
The configuration I use:
{:geo-dimension-uri nil
:time-dimension-uri nil
:codelist-source "http://purl.org/linked-data/cube#ComponentSpecification"
:codelist-predicate "http://publishmydata.com/def/qb/codesUsed"
:codelist-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
:dataset-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
:schema-label-language en
:max-observations-page-size 2000}
@zeginis - I've pushed a fix for the exception to master
, could you check the latest version fixes the issue for you?
@lkitching yes this works fine. Thank you
@lkitching I understand that it is not mandatory for the dimensions to have a codelist However, if a codelist for the usedCodes is defined, then it should contain all and only the used codes at the cube.
This is a common error we need to catch. The error occur at the transformation of data using Table2qb due to not matching URIs between the cube-pipeline
and codelist-pipeline
@lkitching I removed the query that checks if each dimension has a codelist. I left the other query that checks the dimensions that have a codelist if the codelists contain all and onlye the used codes at the cube.