loki
loki copied to clipboard
Large queries cause large memcached delays which then causes 2-3x resource consumption and 2-3x worse query performance
Intro
I'm trying to debug (and reproduce) some performance issues we are having. Eventually I managed to reproduce the behaviour, which comes down to how memcached behaves and how its timeouts are handled.
Summary is:
- Do an expensive query (like
sum by(your-label) (rate({your-label="foo"} |= `` [$__interval]))
- The ^ rate should return about like ~4-5k/s to have a "larger" load to test against
- Do the query over the last hour
- Have a memcached setup
- Notice Queriers getting logs like:
level=warn ts=2022-12-12T10:01:57.938104237Z caller=memcached.go:220 msg="failed to put to memcached" name=chunks err="server=100.98.56.68:11211: read tcp 100.97.243.247:60382->100.98.56.68:11211: i/o timeout"
level=warn ts=2022-12-12T10:01:52.340184048Z caller=memcached.go:220 msg="failed to put to memcached" name=chunks err="server=100.98.56.68:11211: read tcp 100.98.113.236:38376->100.98.56.68:11211: i/o timeout"
level=warn ts=2022-12-12T10:01:52.345175685Z caller=memcached.go:220 msg="failed to put to memcached" name=chunks err="server=100.98.56.68:11211: read tcp 100.98.113.236:38382->100.98.56.68:11211: i/o timeout"
level=warn ts=2022-12-12T10:01:57.946012867Z caller=background.go:127 msg="backgroundCache writeBackLoop Cache.Store fail" err="server=100.98.56.68:11211: read tcp 100.97.243.247:60444->100.98.56.68:11211: i/o timeout"
- Timeouts cause performance issues and increased load in cpu/memory
Memcached related config:
memcached:
batch_size: 256
parallelism: 10
memcached_client:
timeout: 500ms
max_idle_conns: 60
consistent_hash: true
Why not increase timeout?
As you can see, I've already increased it from the default 100ms
to 500ms
- Which IMHO is already quite substantial. I even tried to set it to 1500ms
(1.5 seconds for memcached?..) but while it does "help" a little bit, depending on the size of the query you are doing, I only need to increase this further and further.
Scale memcached horizontally?
With above query + settings, I've already scaled memcached to 12 instances. The problem is that while it does help spreading the load, it does not scale properly. Any larger querier needs even more replicas. At the point that I might need to scale it to 40 or even 100 memcached instances for it not to error
More importantly: the effects:
With 50000ms timeout

Stats
Total request time 8.23 s
Number of queries 1
Total number rows 1801
Data source stats
Summary: bytes processed per second 5.45 GB/s
Summary: lines processed per second 5340548
Summary: total bytes processed 38.9 GB
Summary: total lines processed 38110633
Summary: exec time 7.14 s
Ingester: total reached 144
Ingester: total chunks matched 1279
Ingester: total batches 7553
Ingester: total lines sent 3859683
Ingester: head chunk bytes 0 B
Ingester: head chunk lines 0
Ingester: decompressed bytes 0 B
Ingester: decompressed lines 0
Ingester: compressed bytes 0 B
Ingester: total duplicates 0
With 500ms timeout

Stats
Total request time 19.5 s
Number of queries 1
Total number rows 1801
Data source stats
Summary: bytes processed per second 2.20 GB/s
Summary: lines processed per second 2157152
Summary: total bytes processed 42.3 GB
Summary: total lines processed 41504927
Summary: exec time 19.2 s
Ingester: total reached 144
Ingester: total chunks matched 1736
Ingester: total batches 10065
Ingester: total lines sent 5139368
Ingester: head chunk bytes 0 B
Ingester: head chunk lines 0
Ingester: decompressed bytes 0 B
Ingester: decompressed lines 0
Ingester: compressed bytes 0 B
Ingester: total duplicates 0
Extra info
The example actually shows only 1 querier "go out of bounds" basically. Yet, if the query is larger, you can see the effect on every querier which basically increases cpu and memory usage by 2-3x for all queriers. Plus it also impacts query performance by 2-3 as well.
The bug?
IMO if the timeout is 'low' it should not impact the performance and resources that much. Furthermore, why would memcached hit such high latency by doing such a query? IMO memcached should be capable of handling a large amount of requests. Yet it's also not that we are not pushing it with 1000's of concurrent requests.
Then furthermore, it seems that increasing the timeout incredible stupidly high, it increases performance only more. I.e. it almost looks as if the timeout should be equal to the full query time rather than just the memcached call.
Versions
memcached:1.6.17-alpine loki:2.6.1 Runs on K8s, but also reproduced it locally.
We are seeing this symptom also.
Same issue..
same problem
same problem
Same problem with v2.7.3
Stumbled above this issue too. Same with loki 2.8.2 and memcached 1.6.20.
Same with loki 2.8.2 and memcached 1.6.17. on AWS EC2 with Elasticache
same here.
same problem with Loki 2.8.3 and memcached 1.6.21
same problem Chart version: 5.11.0 Loki version: 2.8.4
We were discussing this in the memcached discord: https://discord.com/channels/634311773750689794/1144428486107336724/1144501450282639430
It's difficult to understand what's going on here without your memcached config, and an understanding of your memcached setup. Is it one server or hundreds? Was there network contention? etc
bump @wiardvanrij
Hi, is there a takeway from this discussion on discord ? We are experiencing poor cache hit ratios when doing large queries on data that is (supposedly) in the extended cache and I was wondering if the timeout errors we see in the logs of loki explain the cache misses. Also, should we put the timeout to an incredibely high value ?
Hi folks. I'm leaving this here for the thread and googlers who come in here: https://github.com/memcached/memcached/wiki/ConfiguringLokiExtstore
TL;DR: Set your memcached timeout
very high and lower the batch_size
and parallelism
down to very low numbers.
This is specifically about loki and extstore, but it discusses the timeout issue a lot of you seem to be running into with vanilla memcached.
To reiterate here: the timeout
value is actually the amount of time to read and parse the entire response batch from each individual server. If you have a small number of memcached servers (ie; 1), and a high batch_size
(like the default 1024), you would be trying to fetch up to 1024 1.5 megabyte chunks over the network from a single host and process the results within timeout
time.
Larger clusters probably don't notice this, since each individual memcached gets a slice of batch_size
keys at once.
Hints as to tuning this are in the linked wiki page. There are some things Loki can do to improve the situation so we'll see how that shakes out.
have fun.
I've created #11345 to address this issue now that we have a better handle on the problem, courtesy of @dormando's detective work.
The PR has been merged; I'll be vendoring this code into Loki soon™️ and we'll cut a release after validating this in production
@dannykopping Any update on the Loki Release...
@Hitesh-Agrawal probably the next minor release, which would be sometime in the next few weeks.
@dannykopping Any update on the fix..
Hi! Any update on this?