loki icon indicating copy to clipboard operation
loki copied to clipboard

feat: improve performance of `first_over_time` and `last_over_time` queries by sharding them

Open jeschkies opened this issue 1 year ago • 3 comments

What this PR does / why we need it: With the introduction of sending query plans with custom nodes to the querier we can now shard first_over_time queries.

Checklist

  • [ ] Reviewed the CONTRIBUTING.md guide (required)
  • [ ] Documentation added
  • [x] Tests updated
  • [ ] CHANGELOG.md updated
    • [ ] If the change is worth mentioning in the release notes, add add-to-release-notes label
  • [ ] Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • [ ] For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • [ ] If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

jeschkies avatar Jan 08 '24 11:01 jeschkies

Trivy scan found the following vulnerabilities:

  • HIGH, Target: docker.io/grafana/loki:main-61a4205 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
  • HIGH, Target: docker.io/grafana/loki:main-61a4205 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0 \nTo see more details on these vulnerabilities, and how/where to fix them, please run docker build -t grafana/loki:main-61a4205 -f cmd/loki/Dockerfile . trivy i grafana/loki:main-61a4205 on your branch. If these were not introduced by your PR, please considering fixing them in via a subsequent PR. Thanks!

github-actions[bot] avatar Jan 08 '24 11:01 github-actions[bot]

hey @kavirajk, actually something fun I learned this past week when finally finding the cause of the issues with our quantile_over_time, TestMappingEquivalence is actually testing the results equivalency for each query as a range query. What we don't have (AFAICT) is tests for the results equivalency for instant queries. The test names here are actually a bit misleading; TestMappingEquivalence is testing for equivalency with sharded queries while TestRangeMappingEquivalence is testing for equivalency with time range split queries.

cstyan avatar Feb 20 '24 00:02 cstyan

@jeschkies any opinions on merging this as is vs as behind a feature flag? given that it's not being executed via a probabilistic data structure I don't think we need a feature flag

cstyan avatar Apr 23 '24 22:04 cstyan