thanos icon indicating copy to clipboard operation
thanos copied to clipboard

Thanos Query Frontend: Caching is inefficient for the queries which have aggregate operators

Open anoop2503 opened this issue 4 years ago • 19 comments
trafficstars

Thanos, Prometheus and Golang version used: Thanos version: v0.18.0 Prometheus version: v2.24.1

What happened:

Caching on Thanos Query Frontend is not working effectively for the queries which have aggregate function with "by" Sample Query: sum(metric_name{label1="", label2="", label=""}) by (label1)

This kind of query through thanos query frontend is always taking the same amount of time to respond. I can see even the downstream Querier is also taking the same amount time only if we hit it directly. So, can not see any difference in the performance when executing this kind of queries even after implementing thanos query frontend .

However, if we remove the "by" part (aggregate over all label dimensions), then the caching is working very well. Sample Query: sum(metric_name{label1="", label2="", label=""})

What you expected to happen:

Thanos query frontend should cache the result of the queries which have aggregate over all label dimensions or preserve distinct dimensions by including a without or by clause(by/without)

How to reproduce it (as minimally and precisely as possible): Sample query: sum(metric_name{label1="", label2="", label=""}) by (label1)

Use some complex queries that takes some time to load. We cannot see the difference in loading time if we use the simple/short time queries.

Anything else we need to know:

  • Thanos Architecture/components: Thanos Query Frontend + Memcached -> Thanos Querier -> Thanos Store Gateway

Environment:

  • All thanos and prometheus related components are deployed as docker containers (Kubernetes)

anoop2503 avatar May 19 '21 13:05 anoop2503

@anoop2503 Thanks for raising this. However, query frontend caches everything returned by the downstream qurier, no matter your promql query has aggregate operators or not.

I am not sure why you are seeing totally different latencies with and without by. Are you running range queries for both queries? Only range queries results will be cached.

yeya24 avatar May 19 '21 15:05 yeya24

I'm not sure that I am able to reproduce this. Also, I have checked out the source code and looks like the whole query get used as part of the key: https://github.com/thanos-io/thanos/blob/main/pkg/queryfrontend/cache.go#L37

GiedriusS avatar May 19 '21 15:05 GiedriusS

yes @yeya24 I am using the range queries. Please see the queries which I used for checking this.

First Query: http://test.thanos-frontend.com/graph?g0.range_input=1800s&g0.end_input=2021-05-17%2018%3A35&g0.step_input=30&g0.max_source_resolution=0s&g0.expr=sum%20(rate%20(container_network_receive_bytes_total%7Bimage!%3D%22%22%2Cname%3D~%22%5Ek8s_.*%22%2Cinstance%3D~%22%5E.*%24%22%2Cpod_name%3D~%22.*%22%2Cnamespace%3D~%22%5Eabc-pbb-acd-pbb-w2%7Cads-pbb-acd-pbb-w2%7Cast-pbb-acd-pbb-w2%24%22%7D%5B1m%5D))%20by%20(pod_name)&g0.tab=0

Second Query: http://test.thanos-frontend.com/graph?g0.range_input=1800s&g0.end_input=2021-05-17%2018%3A35&g0.step_input=30&g0.max_source_resolution=0s&g0.expr=sum%20(rate%20(container_network_receive_bytes_total%7Bimage!%3D%22%22%2Cname%3D~%22%5Ek8s_.*%22%2Cinstance%3D~%22%5E.*%24%22%2Cpod_name%3D~%22.*%22%2Cnamespace%3D~%22%5Eabc-pbb-acd-pbb-w2%7Cads-pbb-acd-pbb-w2%7Cast-pbb-acd-pbb-w2%24%22%7D%5B1m%5D))&g0.tab=0

I can see the caching issue when executing the first query. It takes around 10s to respond with results all the time and sometimes it takes more than 15-20 seconds for the subsequent queries.

But the second query takes around 4s to complete execution first time and takes only 350ms for the subsequent requests.

The only difference between these queries is the first one has "by (pod_name)" with sum function.

@GiedriusS you may not be able to reproduce the issue, if you are executing small queries which takes only short time to complete it's execution. I found this issue when I tried to execute long time queries which takes more than 4s.

Thanks

anoop2503 avatar May 21 '21 14:05 anoop2503

Again this is not related to the topic of aggregate operators as query frontend doesn't parse promql query so it doesn't know about operators, exprs, etc. But I suspect the first query results are not cached for some reason. Could you please share the cache configuration you are using?

yeya24 avatar May 21 '21 15:05 yeya24

Thanks @yeya24 looking into this. Here is the configuration details.

Deployment:

spec:
      containers:
      - args:
        - query-frontend
        - --http-address=0.0.0.0:9009
        - --query-frontend.downstream-url=http://thanos.my-namespace.svc.cluster.local:10902
        - --query-frontend.compress-responses
        - --query-range.split-interval=12h
        - --labels.split-interval=12h
        - --query-range.max-retries-per-request=3
        - --labels.max-retries-per-request=3
        - --query-range.max-query-parallelism=28
        - --labels.max-query-parallelism=28
        - --query-range.response-cache-max-freshness=1m
        - --labels.response-cache-max-freshness=1m
        - --query-range.response-cache-config-file=/etc/thanos/config.yaml
        - --labels.response-cache-config-file=/etc/thanos/config.yaml

Configmap:

apiVersion: v1
data:
  config.yaml: |
    type: memcached
    config:
      addresses: [thanos-memcached.my-namespace.svc.cluster.local:11211]
      timeout: 500ms
      max_idle_connections: 200
      max_item_size: 1MiB
      max_async_concurrency: 20
      max_async_buffer_size: 10000
      max_get_multi_concurrency: 200
      max_get_multi_batch_size: 0
      dns_provider_update_interval: 10s
      expiration: 24h
kind: ConfigMap
metadata:
  name: thanos-frontend

anoop2503 avatar May 21 '21 15:05 anoop2503

@anoop2503 Looks like the only way the results are not cached is that the returned response is larger than 1M. I feel it is impossible for your first query?

yeya24 avatar May 21 '21 15:05 yeya24

@yeya24 Is there any easy way to check the exact size of the returned response?

I don't think the returned response is larger than 1M. Anyway I tried increasing the values in the configuration like below. However, I don't see any difference in the performance even after using below configurations.

Updated configmap:

apiVersion: v1
data:
  config.yaml: |
    type: memcached
    config:
      addresses: [thanos-memcached.my-namespace.svc.cluster.local:11211]
      timeout: 500ms
      max_idle_connections: 400
      max_item_size: 250MiB
      max_async_concurrency: 40
      max_async_buffer_size: 10000
      max_get_multi_concurrency: 400
      max_get_multi_batch_size: 0
      dns_provider_update_interval: 10s
      expiration: 24h
kind: ConfigMap
metadata:
  name: thanos-frontend

Memcached configuration:

containers:
      - args:
        - -m 512
        - -p 11211
        image: memcached:1.6.6
        name: thanos-memcached
        ports:
        - containerPort: 11211
          name: clients
          protocol: TCP
        resources:
          limits:
            cpu: 500m
            memory: 1Gi
          requests:
            cpu: 250m
            memory: 512Mi

anoop2503 avatar May 21 '21 16:05 anoop2503

Any idea about this issue?

anoop2503 avatar May 24 '21 14:05 anoop2503

Can you please check the cache metrics to see your requests really hit the cache or the query result is really cached? Like cortex_cache_hits. You can also check whether your Memcached has the cached query entry using some Memcached tool.

Besides, a query can only partially hit the cache as well, e.g. only part of the requested time range are cached. In this case, it sends another request to downstream to fetch the results.

yeya24 avatar May 24 '21 15:05 yeya24

@yeya24 I did a load testing using both queries mentioned above and monitored the metrics for set and get requests. The metrics which I used for checking set and get requests are given below.

Get:

sum (irate (memcached_commands_total{status="hit",command="get"}[5m])) / sum (irate (memcached_commands_total{command="get"}[5m]))

Set:

sum (irate (memcached_commands_total{status="hit",command="set"}[5m])) / sum (irate (memcached_commands_total{command="set"}[5m]))

Seems like the first query with "by" operator (which has the caching issue) does not hit the cache as expected while performing the load test. not_caching

However the second query is set the cache in the first request itself and get this cached data for all the subsequent queries (cache hit rate for get request is 100%). cached

Based on this I believe it does not cache the response of the query which has the aggregate function with "by" operator.

anoop2503 avatar May 26 '21 13:05 anoop2503

Any other suggestions on this?

anoop2503 avatar May 31 '21 15:05 anoop2503

@anoop2503 I don't think the caching part is wrong as query frontend caches response only and it doesn't know what promql is running.

I am not sure why the first query is not cached but it looks like a configuration issue or something else. Please take a look at the metrics exposed by the query frontend first as it tells why these weird things happen in the first place. If it is possible, you can also run the query frontend in your IDE and use a debugger to check the root cause.

I don't really have time on this recently so help wanted. If anyone else sees the same thing, please comment on the issue.

yeya24 avatar May 31 '21 15:05 yeya24

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] avatar Jul 30 '21 16:07 stale[bot]

Closing for now as promised, let us know if you need this to be reopened! 🤗

stale[bot] avatar Aug 17 '21 02:08 stale[bot]

Just noticed this too, reopening.

  • avg (rate(node_cpu_seconds_total{mode="user",hostname=~".*foo.*"}[1m])) * 100 -> cached
  • avg without (cpu) (rate(node_cpu_seconds_total{mode="user",hostname=~".*foo.*"}[1m])) * 100 -> not cached

GiedriusS avatar Sep 23 '21 11:09 GiedriusS

Actually, I take those words back. Cannot reproduce this now. Probably the items from Memcached get evicted when I sent such heavy queries originally and/or there were intermittent cache connectivity problems. That's what has probably happened in your case as well. Plus, I started writing tests and there is a cache hit even with the by/without operators.

GiedriusS avatar Oct 01 '21 09:10 GiedriusS

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] avatar Jan 09 '22 09:01 stale[bot]

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] avatar Apr 17 '22 04:04 stale[bot]

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] avatar Sep 21 '22 06:09 stale[bot]