nsq icon indicating copy to clipboard operation
nsq copied to clipboard

nsqd: Use Klaus Post's compression libraries

Open philpearl opened this issue 1 year ago • 5 comments

We would quite like to use compression with NSQ to save on data transfer costs, but the CPU impact is higher than we'd like. Our experiments have shown that Klaus Post's compression libraries perform much better than the standard library Deflate and Google's Snappy, with the sweet spot appearing to be level 3 flate compressing our traffic to about 25% of its original size, but only incurring a CPU cost equivalent to Snappy.

Would there be any interest in taking a PR that makes this change?

philpearl avatar Apr 15 '24 13:04 philpearl

At a glance, I don't see any fundamental problem with improving performance by swapping out the dependency. Should we also expose the other compression algorithms, too?

mreiferson avatar Apr 28 '24 18:04 mreiferson

I think that's an excellent idea. zstd in particular has a strong appeal to it.

It's especially appealing with NSQ as it seems like a common pattern is for a topic to have messages with a single, well-defined schema—variations on a theme. Dictionaries (as generated by zstd --train) could be very useful.

adamroyjones avatar Apr 29 '24 07:04 adamroyjones

Fabulous. I'll put together a PR for the dependency swap.

philpearl avatar Apr 29 '24 13:04 philpearl

Hmm, I think our original testing must have been flawed when looking at Snappy. The Klaus Post version of this seems to be slower than the Google version, and the Klaus Post Deflate doesn't reach the speed of Snappy at any level.

This is what I'm getting comparing replacing Snappy & Deflate in both NSQD and the Go NSQ library

name                  old time/op    new time/op    delta
Compress/snappy-16       272µs ± 2%     310µs ± 7%  +13.96%  (p=0.000 n=10+10)
Compress/deflate3-16     746µs ± 1%     612µs ± 2%  -17.91%  (p=0.000 n=10+10)
Compress/deflate5-16    1.06ms ± 1%    0.66ms ± 1%  -37.60%  (p=0.000 n=10+9)
Compress/deflate6-16    1.28ms ± 2%    0.73ms ± 5%  -43.46%  (p=0.000 n=9+10)
Compress/deflate9-16    1.47ms ± 4%    1.72ms ± 9%  +16.33%  (p=0.000 n=10+9)

There's also an added complication that the Klaus Post Deflate compresses a little less at most levels.

=== RUN   TestCompareDeflate
    protocol_v2_test.go:2056: deflate level 1: compress to 19.304255% - 98.069603% of Go deflate
    protocol_v2_test.go:2056: deflate level 2: compress to 18.701276% - 104.756670% of Go deflate
    protocol_v2_test.go:2056: deflate level 3: compress to 18.185710% - 103.829701% of Go deflate
    protocol_v2_test.go:2056: deflate level 4: compress to 16.835251% - 105.500279% of Go deflate
    protocol_v2_test.go:2056: deflate level 5: compress to 16.005709% - 103.914756% of Go deflate
    protocol_v2_test.go:2056: deflate level 6: compress to 15.584694% - 103.055326% of Go deflate
    protocol_v2_test.go:2056: deflate level 7: compress to 15.575774% - 103.398863% of Go deflate
    protocol_v2_test.go:2056: deflate level 8: compress to 15.233253% - 101.558040% of Go deflate
    protocol_v2_test.go:2056: deflate level 9: compress to 14.929979% - 99.571684% of Go deflate
--- PASS: TestCompareDeflate (0.02s)

I still think it's worth replacing the Deflate library, but the motivation is much less than I previously believed. WDYT?

philpearl avatar Apr 29 '24 17:04 philpearl

Meh, doesn't seem worth it? It sounds like we're saying "just use snappy"?

We should land all the benchmark code improvements (I've pushed a few more up to your PR), and https://github.com/nsqio/go-nsq/pull/362 though.

mreiferson avatar May 12 '24 13:05 mreiferson