cubiql icon indicating copy to clipboard operation
cubiql copied to clipboard

Add queries for validating data against CubiQL's expectations.

Open lkitching opened this issue 6 years ago • 12 comments

Issue #127 - Add validation queries for use with rdf-validator which checks a data source against the requirements CubiQL makes on data cubes.

The queries are templates which expect the relevant configuration values to be provided when executed.

Add instructions to the README on running these validations.

lkitching avatar Sep 14 '18 13:09 lkitching

@zeginis - Please could you try running these validation queries against your data and let me know if you have any problems?

lkitching avatar Sep 14 '18 13:09 lkitching

@lkitching I checked the PR. The queries are ok. I have tested them on some data created by Table2qb and they pass the tests.

However I think we need some more tests to cover other CubiQL requirements:

  • There is a code list that contains ONLY the concepts used at the cube
  • There is a code list for each qb:DimensionProperty including qb:MeasureType
  • Maybe we also need a test for the language tags (e.g. @en, nil)

What do you think?

zeginis avatar Sep 25 '18 15:09 zeginis

@lkitching I added 2 new SPARQL queries to support the CubiQL requirements I mentioned.

When I run them independently at the SPARQL endpoint they return no results -> they succeed But when I use them at the validator they fail. Any idea why this happens?

The config I use:

{:geo-dimension-uri nil
 :time-dimension-uri nil
 :codelist-source "http://purl.org/linked-data/cube#ComponentSpecification"
 :codelist-predicate "http://publishmydata.com/def/qb/codesUsed"
 :codelist-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
 :dataset-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
 :schema-label-language nil
 :max-observations-page-size 2000}

I run the validator at: http://195.251.218.39:8893/sparql

zeginis avatar Sep 26 '18 15:09 zeginis

@zeginis - Dimensions no longer need to specify a codelist - any dimensions which do not specify one and which are not ref area, ref period, string or decimal types are mapped to a String type in the schema and are submitted as typed literals within generated SPARQL queries.

lkitching avatar Oct 01 '18 13:10 lkitching

@zeginis - I've pushed a fix to the new queries to allow them to run as expected in rdf-validator. Comments currently need to go after the query so sesame infers the correct query type. This is effectively a bug in rdf-validator we need to fix.

lkitching avatar Oct 01 '18 13:10 lkitching

@lkitching what do you meant they are mapped to a String ?

Can we use such dimensions without codelist to lock dimensions?

e.g.

{cubiql{
  dataset_earnings {
    title
    description
    observations(dimensions:{gender:ALL 
                             population_group:WORKPLACE_BASED 
                             measure_type:MEDIAN}) {    
     total_matches
  }}}}

zeginis avatar Oct 01 '18 15:10 zeginis

@lkitching I tried using a dimension that has values URIs but there is no codelist defined. CubiQL is not working properly:

  • when requesting the dimension values I get an empty list
  • I get an "Internal server error: exception" when requesting the observations.

zeginis avatar Oct 02 '18 12:10 zeginis

You can try at the endpoint: http://195.251.218.39:8893/sparql

The dimension http://example.gr/hello/def/dimension/station_id has no codelist.

The configuration I use:

{:geo-dimension-uri nil
 :time-dimension-uri nil
 :codelist-source "http://purl.org/linked-data/cube#ComponentSpecification"
 :codelist-predicate "http://publishmydata.com/def/qb/codesUsed"
 :codelist-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
 :dataset-label-uri "http://www.w3.org/2000/01/rdf-schema#label"
 :schema-label-language en
 :max-observations-page-size 2000}

zeginis avatar Oct 02 '18 15:10 zeginis

@zeginis - I've pushed a fix for the exception to master, could you check the latest version fixes the issue for you?

lkitching avatar Oct 08 '18 11:10 lkitching

@lkitching yes this works fine. Thank you

zeginis avatar Oct 08 '18 15:10 zeginis

@lkitching I understand that it is not mandatory for the dimensions to have a codelist However, if a codelist for the usedCodes is defined, then it should contain all and only the used codes at the cube.

This is a common error we need to catch. The error occur at the transformation of data using Table2qb due to not matching URIs between the cube-pipeline and codelist-pipeline

zeginis avatar Oct 09 '18 10:10 zeginis

@lkitching I removed the query that checks if each dimension has a codelist. I left the other query that checks the dimensions that have a codelist if the codelists contain all and onlye the used codes at the cube.

zeginis avatar Oct 09 '18 15:10 zeginis