Slow compress/uncompress performance compared to C and Cgo version lz4
Slow performance compared to C and Cgo version lz4:
The File pg1661.txt compress/uncompress Benchmarks: https://github.com/xin3liang/golz4/blob/master-test/lz4_test.go#L705-L707 https://github.com/xin3liang/lz4/blob/v4-test/bench_test.go#L59-L61 https://github.com/inikep/lzbench master
[root@server5 lzbench]# ./lzbench -elz4 ../pg1661.txt
[root@server5 golz4]# go test -bench="Long" -run=^$
[root@server5 lz4]# go test -bench="Long" -run=^$
Thank you for filing this. I am aware of it but when I looked at it a while back, I could not see how to improve it. A contributor implemented asm versions for uncompress which dramatically improved the decoding, but I would rather have a more performant version without it. Note that Go will always be slower than C, which may or may not be acceptable for your use case. Any idea on improving the performance is welcomed though!
Thank you for filing this. I am aware of it but when I looked at it a while back, I could not see how to improve it. A contributor implemented asm versions for uncompress which dramatically improved the decoding, but I would rather have a more performant version without it. Note that Go will always be slower than C, which may or may not be acceptable for your use case. Any idea on improving the performance is welcomed though!
@pierrec Thank you for the reply. Although I have no idea how to improve, just raise a bug here to record this issue.
FYI, Look at the lz4hc test results below, run on the same x86 and arm64 machines
Arm64 CPU: Kunpeng 920-4826 96 cores @ 2.6GHz 4 numa
[root@server5 lzbench]# ./lzbench -elz4hc ../pg1661.txt
lzbench 2.1 | GCC 10.3.1 | 64-bit Linux |
Compressor name Compress. Decompress. Compr. size Ratio Filename
lz4hc 1.10.0 -1 125 MB/s 790 MB/s 314466 52.86 ../pg1661.txt
lz4hc 1.10.0 -2 125 MB/s 790 MB/s 314466 52.86 ../pg1661.txt
lz4hc 1.10.0 -3 47.0 MB/s 800 MB/s 280384 47.13 ../pg1661.txt
lz4hc 1.10.0 -4 35.3 MB/s 795 MB/s 270572 45.48 ../pg1661.txt
lz4hc 1.10.0 -5 27.5 MB/s 793 MB/s 264987 44.54 ../pg1661.txt
lz4hc 1.10.0 -6 22.2 MB/s 792 MB/s 262404 44.11 ../pg1661.txt
lz4hc 1.10.0 -7 18.9 MB/s 796 MB/s 261329 43.93 ../pg1661.txt
lz4hc 1.10.0 -8 16.8 MB/s 795 MB/s 260900 43.85 ../pg1661.txt
...
x86 CPU: Intel(R) Xeon(R) Platinum 8180 111 cores @ 2.50GHz 2 numa
[root@client5 lzbench]# ./lzbench -elz4hc ../pg1661.txt
lzbench 2.1 | GCC 10.3.1 | 64-bit Linux | Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
Compressor name Compress. Decompress. Compr. size Ratio Filename
lz4hc 1.10.0 -1 106 MB/s 629 MB/s 314466 52.86 ../pg1661.txt
lz4hc 1.10.0 -2 106 MB/s 629 MB/s 314466 52.86 ../pg1661.txt
lz4hc 1.10.0 -3 35.6 MB/s 639 MB/s 280384 47.13 ../pg1661.txt
lz4hc 1.10.0 -4 27.2 MB/s 636 MB/s 270572 45.48 ../pg1661.txt
lz4hc 1.10.0 -5 21.4 MB/s 635 MB/s 264987 44.54 ../pg1661.txt
lz4hc 1.10.0 -6 17.7 MB/s 634 MB/s 262404 44.11 ../pg1661.txt
lz4hc 1.10.0 -7 15.1 MB/s 634 MB/s 261329 43.93 ../pg1661.txt
...
I notice that the compress/decompress speed may be related to the compression ratio. The smaller compression ratio, the slower compress/decompress speed. According to this rule, the compress/decompress speed of pure Go lz4(ratio 55.33) should be better than the first line compress/decompress speed(ratio is 52.86) of lz4hc. Maybe the worst part is that the pure Go lz4 arm64's decompress speed(321.25 MB/s), which is much slower than lz4hc(790 MB/s, ratio 52.86).