thanos
thanos copied to clipboard
Consider query function when auto downsample
- [ ] I added CHANGELOG entry for this change.
- [ ] Change is not relevant to the end user.
Changes
When auto downsample is enabled, trying to infer the max resolution parameter from the query string itself rather than using step / 5
.
This is useful to help with https://github.com/thanos-io/thanos/issues/6929.
The idea is simple, for example I have a query like rate[2m]
. Because rate
function requires at least 2 samples to run so minimum resolution is 1m. This should avoid picking higher resolution like 5m or 1h.
Verification
Tests WIP. Need to add the same thing in QFE, too
Does it make sense to solve this at the store level? Since the range is passed in hints, we can decide whether to send downsampled or aggregated chunks to queriers.
Interesting idea. Yeah I think it works to get the range of matrix selector.
I don't have strong preference. But I wonder how we gonna do it at store. Do we ignore max_source_resolution
from SeriesRequest
and only infer it from the query range?
I think we still have the usecase where query range is quite large but we only want to query raw blocks.
Doing it at the query API level seems more flexible.
Looking at the code, I think it would be best to set max resolution here based on the selected range. We can still query raw blocks over a large range, but we should cap the max resolution for small ranges to make sure downsampled blocks cannot be requested.
Otherwise we have to make many places in the codebase, and it will be hard to handle cases with different ranges in a binary expression.
Can we make the engine set a hint since it already analyzes the query and then stores can just act on the hint that the engine provides?
Looking at the code, I think it would be best to set max resolution here based on the selected range. We can still query raw blocks over a large range, but we should cap the max resolution for small ranges to make sure downsampled blocks cannot be requested.
Umm, do we want to set different max resolution per select or use one max resolution value for the whole query?
My current implementation is the latter. Having different max resolution might also work, but I am wondering if it has some issues with our current results cache as it assumes a single resolution for the whole query.
Ideally we should remove max_source_resolution
param totally and seamlessly query downsampled blocks. Users don't have to care about this param and it is more compatible to Prometheus API so that Cortex can also be benefited.
But it is a big change to me.
I think we still have the usecase where query range is quite large but we only want to query raw blocks.
I am wondering if this is a valid usecase. I can only think of people who want to run predict_linear
over raw blocks for better precision because downsampled blocks don't downsample using this. But then it seems making non sense of downsampling.
Having different max resolution might also work, but I am wondering if it has some issues with our current results cache as it assumes a single resolution for the whole query.
It sounds fine to have different resolutions per vector selector. Ideally max source resolution should be removed from the cache key since the resolution is inferred from the query.
Now I am more towards the solution mentioned by @fpetkovski to use Select hints and Range to figure out the resolution. I will try to finish the implementation and tests soon.