cubes icon indicating copy to clipboard operation
cubes copied to clipboard

consider support for W3C Data Cube (QB) RDF

Open VladimirAlexiev opened this issue 8 years ago • 7 comments

Is anyone else interested in Cubes support for the W3C Data Cube (QB) RDF data model https://www.w3.org/TR/vocab-data-cube/? I am particularly interested in supporting W3C Cube in CubesViewer, but as https://github.com/jjmontesl/cubesviewer/issues/70 shows, the preference is to add such support in Cubes, rather than the presentation layer (CubesViewer).

Cheers!

VladimirAlexiev avatar Jun 02 '17 09:06 VladimirAlexiev

Are you still interested in this? I'm thinking on adding multiple backend support to CubesViewer.

jjmontesl avatar Dec 23 '18 21:12 jjmontesl

Still interested, and we may be able to contribute some development through the BigDataGrapes project. It's about agricultural observations data that will use both QB and Geospatial components (similar to QB for Earth Observation, https://www.w3.org/TR/eo-qb/).

VladimirAlexiev avatar Dec 27 '18 07:12 VladimirAlexiev

@jjmontesl wrote: I have similar projects: ETL tools and stuff built to import Spanish and Eurostat data. It'd be nice to talk about that at some point.

QB has been widely used to represent statistical data. Excerpt from a report I wrote : QB incorporates an OLAP data model and statistical classifications following SDMX. There is a number of statistical datasets available as RDF, including:

  • Linked SDMX Data developed by Sarven Capadisli: International Monetary Fund IMF, OECD, UN Food and Agriculture Organization FAO, Swiss Federal Statistical Office BFS, European Central Bank ECB, World Bank, Transparency International.
  • Eurostat developed by the LOD Around the Clock (LATC) project (static)
  • Eurostat wrapper developed by Benedikt Kämpgen (updateable)
  • US Securities and Exchange Commission SEC Edgar Wrapper developed by Benedikt Kämpgen
  • UN ComTrade developed by the Multisensor project

It lists two QB viewers, and then CubeViewer, which is much more powerful than those.

VladimirAlexiev avatar Dec 28 '18 09:12 VladimirAlexiev

QB https://www.w3.org/TR/vocab-data-cube/ includes powerful OLAP metadata called qb:DataStructureDefinition and qb:SliceKey. The overall structure of the ontology is this:

VladimirAlexiev avatar Dec 28 '18 09:12 VladimirAlexiev

@jjmontesl thinking on adding multiple backend support to CubesViewer

From the other discussion it seems that Cubes has multiple backend support but maybe it's not the best way forward? Seems you've already determined that the better engineering approach is to add such support to CubesViewer instead?

VladimirAlexiev avatar Dec 28 '18 09:12 VladimirAlexiev

The reason I mentioned that is because CubesViewer needs multiple backend support anyway as I'd wish to add a local CSV/Tabularin any case, and perhapsp MDX in the future. From that point of view, it eases the path to integrate W3C Cube RDF.

But I'm not quite sure of the better engineering approach to this. On one hand, I'm not sure exactly I understand what you have in mind in terms of supporting QB. I have no experience with it and I'm not even sure what the possibilities are (I understand CubesViewer can consume W3C Cube schemas, but I'm not sure about how data is published/consumed).

jjmontesl avatar Dec 31 '18 17:12 jjmontesl

@jjmontesl http://estatwrap.ontologycentral.com/page/ei_bsco_m is an example RDF QB dataset. At the bottom there are links for viewing it as a table, and downloading it as:

  • RDF (data and DSD)
  • SDMX (XML data and XSD)
  • TSV If you convert the RDF DSD to Turtle (using Jena RIOT or http://rdf-translator.appspot.com/) and look at it, you'll see the structure:
<ei_bsco_m.rdf#dsd>
        a             qb:DataStructureDefinition ;
        qb:component  [ qb:attribute  estat:freq ] ;
        qb:component  [ qb:attribute  estat:geo ] ;
        qb:component  [ qb:attribute  estat:obs_status ] ;
        qb:component  [ qb:attribute  estat:timeformat ] ;
        qb:component  [ qb:dimension  dcterms:date ] ;
        qb:component  [ qb:dimension  estat:indic ] ;
        qb:component  [ qb:dimension  estat:s_adj ] ;
        qb:component  [ qb:dimension  estat:unit ] ;
        qb:component  [ qb:measure    sdmx-measure:obsValue ] ;
        foaf:page     <ei_bsco_m.rdf> .

Note: I'm not sure why qb:attribute estat:geo: I think that should be qb:dimension since the geo (refArea) is surely a dimension of the observation.

If you look at the data, it consists of a bunch of nodes like this:

[ a                      qb:Observation ;
  estat:geo              <dic/geo#SE> ;
  estat:indic            <dic/indic#BS-PT-NY> ;
  estat:s_adj            <dic/s_adj#SA> ;
  estat:unit             <dic/unit#BAL> ;
  dcterms:date           "2016-11" ;
  qb:dataSet             <id/ei_bsco_m#ds> ;
  sdmx-measure:obsValue  27.7
] .

All dimensions and the measure are present in each observation. On the other hand the attributes freq obsStatus timeformat are optional, and are not present.

VladimirAlexiev avatar Jan 02 '19 06:01 VladimirAlexiev