mimir
mimir copied to clipboard
Cross-tenant queries produce cardinality estimation cache keys that are too long
Describe the bug
When running cross-tenant queries with the query-frontend results cache enabled, the query caching middleware attempts to concatenate all tenant IDs into the key (code). There's a 250-byte limit on memcached keys (code), so it's possible to have enough tenants (or long enough tenant names) that the key ends up exceeding that 250-byte limit, at which point gets will fail.
To Reproduce
Steps to reproduce the behavior:
- Create multiple tenants whose names, concatenated, exceed 250 bytes in length.
- Query across all tenants using the wildcard tenant.
- See malformed key logs on memcached fetches.
Expected behavior
Keys should be short enough that they don't exceed the 250-byte limit. Perhaps that means that cache keys should be created and looked up on a per-tenant basis?
One potential solution to this is to sort the tenant IDs (the parsing logic might already do this) and hash them and use the hash in the cache key instead of the tenant IDs directly.
the cache key is then hashed and that's used to key memcached. The full key is stored as part of the item to check for hash collisions, but it's not included verbatim in the cache key.
https://github.com/grafana/mimir/blob/efa26e743027bf495c340a893939b2671ba05959/pkg/frontend/querymiddleware/split_and_cache.go#L458
Ah, you're right, and then the hashed key gets read back out here, I missed that. So this isn't a problem for the results cache, but it seems like it is a problem for a cache somewhere, likely the cardinality estimation cache, since I do see the QS prefix in logs.
It seems we have been hitting the same issue since adding new tenants 2 days back
We have the below in logs for query-frontend
{"time":"2023-12-19T17:27:04.952051693Z","message":"ts=2023-12-19T17:27:04.950734513Z caller=memcached_client.go:335 level=warn name=frontend-cache msg="failed to fetch items from memcached" numKeys=1 firstKey=1@QS:
When I check the length of all the tenants, it is exceeding 250 characters @chencs Are you seeing similar log entries ?
@rnerkar yes, except my logs don't include that tenant key. My understanding is that the cardinality estimation cache is used to limit the number of shards a query can be broken into for parallelization (code).
From reading this issue describing the goals for implementing cardinality estimation for query sharding, I believe this means that the failure mode (not being able to fetch cardinality estimation cache keys) potentially results in unnecessary over-sharding (a nuisance with respect to overhead), but not unnecessary under-sharding (OOMs resulting in query failure).
Thanks for explaining @chencs So with the understanding that there are no adverse effects on Mimir's performance with these errors, except for the fact that small queries will get over-sharded Do we have a fix coming in, or something to be done from our end? We have had talks about using redis instead of Memcached to get around this issue but that's still not finalized Will wait till I hear back from you guys. Thanks in advance !
Do we have a fix coming in, or something to be done from our end?
We apply a hash to keys to make them a consistent length elsewhere, we'll do the same here. No ETA but it should be a simple fix.
We have had talks about using redis instead of Memcached to get around this issue but that's still not finalized Will wait till I hear back from you guys. Thanks in advance !
We don't recommend using Redis since we (Grafana) don't use it in production and it's not part of the Helm chart or Jsonnet.
We're hitting this issue as well.
Note that it also seems to increment thanos_memcached_operation_failures_total, so we actually get alerted on this.