cubiql icon indicating copy to clipboard operation
cubiql copied to clipboard

Search datasets

Open zeginis opened this issue 7 years ago • 6 comments

It is good to have a functionality that enables the searching of data sets:

  • filter datasets by metadata e.g. title, publisher, license, issued, modified, category. For example find dataset that are publices by DCLG, or find datasets issued the last month.
  • filter datasets by structure e.g. dimensions/attributes/measure/hierarchical levels. For example find datasets that measure uneployment at the electoral wards.
  • filter datasets by data e.g. filter dataset that have the time period 2016. We can use the same filtering as #12.
  • support bollean AND/OR at the above options

zeginis avatar Sep 18 '17 12:09 zeginis

I have created a google doc to document the search functionality we want to implement. It contains the GraphQL queries that should be supported.

https://docs.google.com/document/d/1Trw9NM_gUM_qA6aM7t_NaWUCfuQ1ZQeyH7Q8DvcIAy4/edit?usp=sharing

zeginis avatar Sep 26 '17 08:09 zeginis

We could consider using Lucene search operations over literals, where supported (e.g. http://www.stardog.com/docs/#_search)

ricroberts avatar Sep 26 '17 16:09 ricroberts

I have addes separete issues for each functionality see: #28, #29, #30, #31, #32

zeginis avatar Sep 28 '17 12:09 zeginis

I think a Lucene (text) search against title, description, theme would be very useful - as Ric says, not every SPARQL endpoint will necessarily support it, and different databases might have different SPARQL extensions for this purpose, so it would not be very 'standard'.

Searching datasets by a text search on the labels of dimensions and/or dimension values is something that Robin previously identified as a useful thing. eg find me datasets that have information about 'Manchester' or 'working age' or 'ethnicity' - but often the user won't know the specific URI.

BillSwirrl avatar Oct 04 '17 16:10 BillSwirrl

I agree with you that a free text search is also required. E.g find datasets about 'working age adults'. In this case a literal search will return datasets that contain 'working age' either at the title, comment, dimension label (e.g. http://statistics.gov.scot/data/qualifications-working-age-people)

This will work complementary with a more structured type of search where the user or the client program knows the specific URI. For example get datasets that have the value ''working age adults' at the dimension 'Population Group' (e.g. http://statistics.gov.scot/data/poverty).

This structured type of search is required in order to get datasets with 'similar' structure that can be processed together e.g. at a machine learning component.

zeginis avatar Oct 05 '17 08:10 zeginis

A case where a combination of free text search and structured search is required is the following.

Search for datasets that contain the year 2013. This can be translated to a query: "Give me the datasets that have the value 2013 at the dimension refPeriod".

However 2013 can be represented in different ways at the dataset e.g.:

  • 2013
  • 2013-Q1
  • "2013 - 2014"
  • etc

zeginis avatar Oct 06 '17 12:10 zeginis