visualization-tool optimize slow queries

optimize slow queries

Open Rdataflow opened this issue 1 year ago • 1 comments

Is your enhancement related to a problem? Please describe. Loading the editing view for some cubes can take up to 1min :-(

Describe the solution you'd like much quicker loading time <5s for overall edit view, thus <2s for single queries

Describe alternatives you've considered send the editors to get them a coffe? :-)

List of observed slow queries on cubes (yet growing...)

[ ] ~60s on loading https://int.visualize.admin.ch/create/new?cube=https://environment.ld.admin.ch/foen/nfi/nfi_C-96/cube/2023-1

time curl 'https://int.visualize.admin.ch/api/graphql' -H 'x-visualize-cache-control: no-cache'   --data-raw $'{"query":"query DataCubePreview($iri: String\u0021, $sourceType: String\u0021, $sourceUrl: String\u0021, $locale: String\u0021, $latest: Boolean, $filters: Filters) {\\n  dataCubeByIri(\\n    iri: $iri\\n    sourceType: $sourceType\\n    sourceUrl: $sourceUrl\\n    locale: $locale\\n    latest: $latest\\n  ) {\\n    iri\\n    title\\n    description\\n    publicationStatus\\n    dimensions(sourceType: $sourceType, sourceUrl: $sourceUrl) {\\n      ...dimensionMetadata\\n      __typename\\n    }\\n    measures(sourceType: $sourceType, sourceUrl: $sourceUrl) {\\n      ...dimensionMetadata\\n      __typename\\n    }\\n    observations(sourceType: $sourceType, sourceUrl: $sourceUrl, limit: 10) {\\n      data\\n      sparql\\n      sparqlEditorUrl\\n      __typename\\n    }\\n    __typename\\n  }\\n}\\n\\nfragment dimensionMetadata on Dimension {\\n  iri\\n  label\\n  description\\n  isNumerical\\n  isKeyDimension\\n  dataType\\n  order\\n  values(sourceType: $sourceType, sourceUrl: $sourceUrl, filters: $filters)\\n  unit\\n  related {\\n    iri\\n    type\\n    __typename\\n  }\\n  ... on TemporalDimension {\\n    timeUnit\\n    timeFormat\\n    __typename\\n  }\\n  ... on NumericalMeasure {\\n    isCurrency\\n    currencyExponent\\n    resolution\\n    isDecimal\\n    __typename\\n  }\\n}\\n","operationName":"DataCubePreview","variables":{"iri":"https://environment.ld.admin.ch/foen/nfi/nfi_C-96/cube/2023-1","sourceType":"sparql","sourceUrl":"https://int.lindas.admin.ch/query","locale":"de"}}'  -o /dev/null

@bprusinowski @adintegra cc @zellersabine

Jun 15 '23 08:06 Rdataflow

Hi @Rdataflow, I investigated the performance of the queries and it seems that the main deal-breaker is the CubeObservations query, which you mentioned in the issue. It's a problem especially for larger cubes, as the NFI ones (and it really seems related to the query, as it also takes around 60s to run directly against LINDAS).

We also plan to share examples of smaller queries that are fired in large number simultaneously (e.g. the ones to fetch the dimension values) and ask to confirm it merging them into one big query would make the performance better (and if so, how to achieve that programatically, maybe there is a library to do that already).

We will also mention https://github.com/visualize-admin/visualization-tool/pull/1043 to see if something could be done there and the cached LINDAS endpoint, if there is anything to consider before switching to it – this also should improve performance.

Slightly off-topic, as it's not related to the LINDAS or reaching out to Zazuko, but we also thought about performance optimisation that goes into other direction (https://github.com/visualize-admin/visualization-tool/issues/1073). I think this could be related to https://gitlab.ldbar.ch/bafu/umweltdatenkiosk-planning/-/issues/524 (if there is / will be a notification when a dataset is updated, there should be a way to trigger an API call to Visualize so we could remove the cache) – maybe this would solve most of the performance problems related to published / embed charts.

Jun 21 '23 07:06 bprusinowski

visualization-tool visualization-tool copied to clipboard

optimize slow queries

visualization-tool
visualization-tool copied to clipboard