mimir
mimir copied to clipboard
zstd compression support between distributors and ingesters
Is your feature request related to a problem? Please describe.
Are there plans for zstd support between distributors and ingesters using the grpc_compression flag?
Enabling zstd compression would help significantly reduce the amount of cross-AZ traffic our org pays for running Mimir. gzip gets some wins, at the expense of CPU.
Now that there is a pure Go zstd implementation, are there any major blockers for introducing it to Mimir?
Describe the solution you'd like
Support for zstd when using the grpc_compression.
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Are you specifically asking for the -ingester.client.grpc-compression flag to also support zstd?
Are you specifically asking for the
-ingester.client.grpc-compressionflag to also support zstd?
I'm specifically looking for compression support on the ingester-client, yes. That's our most expensive client.
Though, judging by how the config is laid out, I suspect it is a shared compressor for all the gRPC clients?
Though, judging by how the config is laid out, I suspect it is a shared compressor for all the gRPC clients?
In effect yes. I just needed more clarification, as your description was relatively vague. I'm trying to see whether the team has any opinion on supporting zstd.
So, I found out that the Mimir team already has considered zstd support, but the zstd implementation we considered is apparently not very performant.
Is that the same zstd implementation as you're referring to?
So, I found out that the Mimir team already has considered zstd support, but the zstd implementation we considered is apparently not very performant.
Is that the same zstd implementation as you're referring to?
Thanks for checking in. I'm looking at https://github.com/klauspost/compress/tree/master/zstd as a possible zstd implementation. It looks like what you linked is a wrapper on top of klauspost/compress.
What were the reasons why it was not performant? I'm surprised to hear it was not compared to the other offering, gzip. Surely it strikes in the middle of gzip and snappy but with a much higher compression ratio.
@aknuds1, checking in - are you able to share more on the reasons why that implementation was not performant?
I just know that @bboreham considered the aforementioned zstd implementation, and found performance problems with it.
When we discussed trying another compression algorithm for cheaper AZ traffic internally, the other point made was it wasn't worth all the testing that comes along with it.
I have some new context to share to shed hopefully enough light on why the Mimir team doesn't think the investment into a new compression algorithm for distributor -> ingester traffic is going to be worthwhile. I don't know if you're perhaps already familiar with this, but Mimir is moving towards a fundamentally revamped architecture, where distributors will no longer be writing directly to ingesters, but instead to a Kafka compatible back-end, which ingesters again read from. Please see [FEATURE] Experimental Kafka-based ingest storage in the changelog for reference.
I remarked on the amount of memory allocations it does. Looks like this was improved recently, with the addition of a pool in https://github.com/mostynb/go-grpc-compression/commit/629c44d3acb9624993cc7de629f47d72109e2ce5.
Someone else commented https://github.com/mostynb/go-grpc-compression/issues/25
@aallawala @aknuds1 @bboreham
I decided to experiment with zstd and s2 GRPC compression from https://github.com/mostynb/go-grpc-compression. So, I created patched dskit, with zstd and s2 compression and then built mimir 2.13.0 with that patched dskit - it's nothing fancy, just dskit pointed to patched version.
Then I ran this patched version in our test cluster, first with zstd compression, then s2. Please note, that before test test cluster already run Mimir 2.13.0 with Snappy GRPC compression enabled, so, comparison below is to snappy and NOT uncopressed:
Results:
Zstd results comparing to Snappy:
- traffic dropped from 4MB/s per writer to 2.2MB/s per writer incoming, 3Mb/s outgoing to 1MB/s outgoing
- i.e. 7MB to 3.2MB drop from snappy - 54% less traffic
- cpu didn't change significantly
- memory increased a lot - from 6.5G per writer to 15GB per writer - I think thats a show stopper
- 99p write latency dropped from 90 to 50 ms.
So, results are good, but memory usage is tremendous. I checked heap, majority of memory consumed by zstd.EnsureBlock:
But I also have a good news too, when I tried to switch to s2 compression, which is better version of snappy.
S2 results comparing to Snappy:
- traffic almost similar to zstd - receive 2.4MB/s per writer, send 1.4MB/s per writer
- i.e 7MB to 3.8MB drop from snappy - 45% less traffic
- cpu didn't change significantly
- memory even dropped comparing to snappy - from 6.5G per writer to 5.8G per writer
- 99p write latency dropped from 90 to 60 ms
(left is Snappy, middle - zstd, right part - S2)
So, after that I decided to drop zstd (first I wanted to test Datadog or another wrapper but then I realized that Mimir has no CGO enabled) and S2 looks really promising.
I know that Mimir architecture is revamped and will be migrated to Kafka, but it will take some time to implement that properly - and S2 patch can shave 50% of cross AZ data cost right now.
And I don't like run patched version and prefer to port my changes to upstream. I can clean up and submit PR for experimental S2 support in dskit. Should I or no sense?
@deniszh, thanks so much for trying it out and posting your results. The memory increase seems to match up with what @bboreham said earlier in his findings too.
Thanks for also trying out s2. The results seem much more favorable and it would be something I can also help drive in order to get this ported upstream. Do you have a PR available on the dskit side for s2?
@aallawala : https://github.com/grafana/dskit/pull/582
But please take note that latest mimir is not ready to latest dskit. So, if you want to test you can build mimir from this branch https://github.com/deniszh/mimir/tree/2.13.0-grpc-s2
Tried S2 on bigger cluster - it's less impressive but I see 70MB/s instead of 105MB/s - 30% decrease.
Thanks for your testing @deniszh! The results look very promising. I'm asking the Mimir team what they think.
@deniszh the Mimir team is positive to using S2 for gRPC compression in Mimir. I guess we should start out by reviewing your dskit PR.
👍
Will be S2 better than gzip 🤔 compression?
I didn't compare to gzip. Maybe by ratio it is, but not by resources and speed.
FWIW I am considering a change to the zstd wrapper in https://github.com/mostynb/go-grpc-compression/pull/34
Ok, I decided to tried it one more time. First, I applied @mostynb patch above, but for some reason I got panic somewhere in ztsd.go when calling close() in my tests. So, I decided to simply take S2 compressor (which works perfectly) and change S2 readers and writers there with zstd decoder and encoder. I implemented that in https://github.com/deniszh/mimir/tree/2.14.2-grpc-s2-zstd branch (see this commit, rest is backports of 2.14.x fixes). After that I enabled zstd grpc compressor in the same cluster where I had S2 compressor before. Results are below:
-
no changes in write latency, read latency, rule evaluation latency (comparing to S2)
-
very slight improve for write bandwith:
(65MB S2 vs 54MB ZSTD - 17% better) (45MB S2 vs 35MB ZSTD - 22% better)
-
no significant changes in read bandwith
-
no significant changes in CPU for read and write components
-
no significant changes in memory for mimir-write
-
significant memory increase for mimir-read
(300% worse for ZSTD, similar effect is for mimir-backend)
Conclusion:
- not sure if saving 15-20% of network bandwidth worth 3x increase of RAM for read components. OTOH, it's much better than previous implementation (now mimir-read RAM consumption is not increased)
- We would stay on S2, though.
PS: OTOH we need ZSTD compression between distributor and ingesters, would try that, using S2 for rest.
OK, I applied patch on top of 2.15.0, then override ingester compression with "-ingester.client.grpc-compression=zstd", keeping s2 for the rest of clients. Results:
-
99p write latency is the same
-
cpu / memory of write component mostly the same (memory even improved a bit, but probably because of 2.15.0)
-
write bandwith improved:
RX - 63MB to 50MB TX - 43MB to 32MB Total - 106 to 82 - 22% more (comparing to S2, which is kinda impressive):
Will do more tests and give it some time for stability check and then introduce PR for zstd only in ingester client.
OK, on bigger cluster I definitely see 22% of bandwith savings, but at the same time 40% increase of CPU in distributors. Ingesters resources are the same, so, probably, still feasible in some use cases but generally speaking S2 can be probably still good compromise.
Latest update - compared zstd with gzip. Compression ratio is the same as zstd (17% better than S2), but CPU-wise zstd is better than gzip for 15%, see graph of distributor CPU, normalized to S2 consumption:
Left is S2 (100), medium - ZSTD (140), right - gzip (165). So, maybe still worth to have it in ingester if anyone wants to save more traffic than S2 in exchange for CPU (but less CPU than gzip).
So, for someone who curious/watching this topic I can add current state of things. We still running ZSTD patch on our Mimir, I hope Grafana still considering it to merge. I tried various optimizations and other things:
-
I tried to replace purego zstd with C-linked version in gozstd. Good news - it works. Bad news - it's not working better than purego version! Not in speed, not in resource consumption or ratio! So, kudos to @klauspost for such good optimization.
-
I tried to implement dictionary support in ZSTD but the problem here that ZSTD do not support building dictionary from stream, so, I tried to mimic it. My idea was that majority of data in distributor-ingester traffic is key-value tags, so, I collected subset of tags and train zstd on it. Result - some CPU increase, no significant compression ratio increase. Probably, we can try again with real GRPC dumps, not sure.
-
I tried MinLZ compression (which is successor of S2). I didn't compare it with S2 but it looks similar - worse ratio, but less CPU than ZSTD. Not sure if worth to replace S2, though.
-
I tried brotli compression. Good thing - it's indeed give you best compression ratio but CPU consumption is terrible. :( Maybe, purego brotli implementation is not that optimized as zstd one. Cpu difference was so bad, so, I decided not to try brotli with dictionaries. Maybe will try C-linked version, though.
-
Also tried to increase block size in ZSTD from 512K to 2MB - no changes in compression ratio
-
Added lowmem flag to zstd - looks like it decreased memory consumption a bit, 3-5% globally.
So, conclusion - S2 is still the best option for generic GRPC compression, and ZSTD is still give you best compression ratio in exchange for some sane CPU amount.
@deniszh Thanks for your insights!
MinLZ is written as a straight replacement for S2 (and based on the same code), with similar CPU for compression, less for decompression, but pretty much always better compression ratios. Direct improvements are most with regards to streams. The only exception is dictionary encodes, which we postponed past v1.