cubiql
cubiql copied to clipboard
Performance issue to get all the observations
I need to get ALL the observations to run some statistical analysis on top of them.
For example I run:
{dataset_births{
observations(dimensions:{reference_period:"2012"}){
total_matches
page(first:"1000"){
result{
count
gender
reference_area
reference_period
}}}}}
The total maches are 44952. Since the max limit is 1000. I have to run 44 queries. This takes much time.
I also tried localy to increase the max limit in order ti get all the results in a single query (e.g. max limit=50000). However the time required is ~ 30 second.
We'll review the sparql queries for efficiency, but this is a fundamental issue with using LIMIT
/OFFSET
. We can look at farming out whole slice/downloads to a specific download service; and providing better HTTP cache-control headers, so clients don't necessarily need to repeat queries.