cubiql icon indicating copy to clipboard operation
cubiql copied to clipboard

Performance issue to get all the observations

Open zeginis opened this issue 7 years ago • 1 comments

I need to get ALL the observations to run some statistical analysis on top of them.

For example I run:

{dataset_births{
observations(dimensions:{reference_period:"2012"}){
  total_matches
  page(first:"1000"){
    result{
      count
      gender
      reference_area
      reference_period
}}}}}

The total maches are 44952. Since the max limit is 1000. I have to run 44 queries. This takes much time.

I also tried localy to increase the max limit in order ti get all the results in a single query (e.g. max limit=50000). However the time required is ~ 30 second.

zeginis avatar Nov 29 '17 13:11 zeginis

We'll review the sparql queries for efficiency, but this is a fundamental issue with using LIMIT/OFFSET. We can look at farming out whole slice/downloads to a specific download service; and providing better HTTP cache-control headers, so clients don't necessarily need to repeat queries.

RickMoynihan avatar Dec 01 '17 10:12 RickMoynihan