compress
compress copied to clipboard
Switch from DataDog/zstd to valyala/gozstd in tests and benchmarks
DataDog/zstd has poor support for streaming, has potential bugs in streaming and is less optimized than valyala/gozstd. So I'd recommend switching to valyala/gozstd
in tests and benchmarks.
Also valyala/gozstd
vendors the latest upstream zstd
release without any modifications.
I wouldn't mind, especially since it performs measurably worse the DataDog's - both in terms of compression and speed ;)
Before I add it, I would just like to make sure that this is expected. I have:
file out level insize outsize millis mb/s
enwik9 zskp 1 1000000000 348027537 7499 127.16
enwik9 zstd 1 1000000000 358512784 5527 172.55
enwik9 zstd 2 1000000000 332265719 6497 146.79
gob-stream zskp 1 1911399616 272529084 6956 262.02
gob-stream zstd 1 1911399616 249054924 5069 359.54
gob-stream zstd 2 1911399616 236984664 5297 344.06
adresser.001 zskp 1 1073741824 18235536 1211 845.58
adresser.001 zstd 1 1073741824 16758988 885 1155.76
adresser.001 zstd 2 1073741824 16920671 1092 937.73
rawstudio-mint14.tar zskp 1 8558382592 3800966951 43398 188.07
rawstudio-mint14.tar zstd 1 8558382592 3609530535 31977 255.24
rawstudio-mint14.tar zstd 2 8558382592 3504208748 36672 222.57
10gb.tar zskp 1 10065157632 5001038195 58193 164.95
10gb.tar zstd 1 10065157632 4941154808 43283 221.77
10gb.tar zstd 2 10065157632 4856098868 45106 212.81
I think only gob-stream
level 1 is smaller, the rest are larger, and the speed is quite a bit lower than https://github.com/klauspost/compress/tree/master/zstd#performance
Is that the cost of adding CRC (in terms of speed?)
Using master/f50fb7adf06ebfa200c0146f4de069f9f42b693d
In terms of benchmarking I am not doing much more than io.Copy
from file -> readahead -> compressor -> io.Discard
. The file is loaded in RAM.
Oh yeah, the Datadog version I am using is v1.4.0 / 809b919c325d7887bff7bd876162af73db53e878
.
The performance difference looks unexpected.
Is that the cost of adding CRC (in terms of speed?)
Probably this is related to different API functions used for stream compression:
-
valyala/gozstd
puts the written data into internal buffer and then calls C.ZSTD_compressStream when the buffer is full or whenFlush
is explicitly called. -
DataDog/zstd
just calls C.ZSTD_compressContinue for eachWrite
call. This may result in sub-optimal performance when small chunks are passed toWrite
call. Additionally, this may corrupt RAM - see comments starting from here.
I'll try avoiding copying to internal buffer in valyala/gozstd
if the buffer passed to Write
exceeds internal buffer size. Probably this will improve performance for the use case when the data is fully loaded in RAM before passing to Write
.
I think I found the main issue that explains performance difference and compressed size difference: the readahead implements WriteTo method, which is short-circuited inside io.Copy
, so it tends to pass big chunks of read ahead data (up to 1MB by default) to compressor.Write
. DataDog/zstd
passes these chunks directly to C.ZSTD_compressContinue
on each Write
call, while valyala/zstd
copies them into smaller internal buffer (its' size is defined in zstd
sources as ZSTD_CStreamInSize()
=> ZSTD_BLOCKSIZE_MAX
=> 1<<17
, i.e. 8 times smaller then the buffer passed to Write
) and calls C.ZSTD_compressStream
on the internal buffer. This results in 8x more cgo calls.
I'm unsure about changing the internal buffer size inside valyala/gozstd
, since zstd
docs say that ZSTD_CStreamInSize()
returns the recommended size for input buffer:
ZSTDLIB_API size_t ZSTD_CStreamInSize(void); /**< recommended size for input buffer */
Probably ZSTD_CStreamInSize
should return bigger value, since it results in better compression speed and ratio? cc'ing @Cyan4973 for the input.
ZSTD_compressStream()
makes all the necessary buffering internally. It will try to consume as much as it can, and produce as much as possible, then report these quantities to the caller. Whatever doesn't qualify as a full block just gets buffered internally, and waits for more input (or for flush()
).
This is very different from ZSTD_compressContinue()
which acts immediately, consuming all the input, forcing the block limits at immediate end of input, and failing with an error code if it can't find enough place for the output. It's also dependent on the caller buffering correctly past data, which is a major contract complexity. ZSTD_compressContinue()
is designed for memory-constrained devices and systems, and will likely never reach "stable" API status.
ZSTD_compressStream()
is the officially supported streaming API of Zstandard. It's compatible with advanced parameters.
In this context, ZSTD_CStreamInSize()
is just a recommended size. It's in no way required to follow it, the streaming API ZSTD_compressStream()
accepts any input size, including very large ones.
The "recommended" size should be understood in the context of a C caller. It doesn't consider the cost of crossing dynamic language interfaces, such as cgo
or jni
.
For such cases, a major performance rule is to reduce the number of domain traversal to a minimum,
hence favor very large input and output buffers, as much as practical.
Yann, thanks for the explanation! I'll add the ability to configure internal buffer size for streaming compression by providing new API in valyala/gozstd
for this use case.
@klauspost , could you run fuzz tests for valyala/gozstd
like you dit it for DataDog/zstd
before starting using it?
I seem to be getting this error when I try to fuzz it:
e:\gopath\src\github.com\klauspost\compress\zstd>go-fuzz-build -tags=decompress github.com/klauspost/compress/zstd
failed to execute go build: exit status 2
# github.com/valyala/gozstd
gcc: error: e:/temp/wintemp/go-fuzz-build292968055/gopath/src/github.com/valyala/gozstd/libzstd_windows_amd64.a: No such file or directory
Before I start digging - do you know how I fix this?
It looks like Windows support is broken, since nobody used it since the addition :( @zhuyie , could you look into Windows support for github.com/valyala/gozstd
?
In the mean time try running make clean libzstd.a
from the root of the repo. It should build the missing file. See the FAQ for details.
I think it is specific for go-fuzz (notice how it moves the sources). The ordinary build process works fine.
It is basically running this script:
https://github.com/klauspost/compress/blob/master/zstd/testdata/fuzz-decompress.cmd
Here is the fuzzing function:
https://github.com/klauspost/compress/blob/master/zstd/fuzz_decompress.go
Just replace the import and you should be going. You can add additional fuzz targets here:
https://github.com/facebook/zstd/releases/tag/fuzz-corpora goes in here: https://github.com/klauspost/compress/tree/master/zstd/testdata/fuzz-decompress/corpus
Testing the compressor is kindda pointless, as you might as well do that from your own package (and go fuzz doesn't help much since it is C code and it cannot mark coverage).
Comparing decompressors, there will we differences. Some values are up to the implementation and I am not as strict on some other values (which I should fix, but not super important).
@klauspost , could you re-run benchmarks against gozstd v1.5.1? It contains the updated libzstd_windows_amd64.a
file for zstd 1.4.0
. Previously this file was built with a year-old zstd
code. This may explain the performance difference.
Additionally, could you provide a guideline on how to run these benchmarks, so next time I could run them myself?
@valyala With 1.5.1
I see this package mostly delivering the same performance as "valyala/gozstd", except in highly repetitive loads for streams.
zskp: this package. Level 1= Fastest, Level 2 = Default zstd: "valyala/gozstd". Level passed as is
file out level insize outsize millis mb/s
enwik9 zskp 1 1000000000 343833033 5968 159.80
enwik9 zskp 2 1000000000 317838745 8252 115.57
enwik9 zstd 1 1000000000 358072021 5646 168.91
enwik9 zstd 3 1000000000 313734672 8297 114.94
file out level insize outsize millis mb/s
gob-stream zskp 1 1911399616 234981983 5275 345.50
gob-stream zskp 2 1911399616 208669457 6829 266.89
gob-stream zstd 1 1911399616 249810424 4968 366.85
gob-stream zstd 3 1911399616 208192146 6234 292.40
file out level insize outsize millis mb/s
consensus.db.10gb zskp 1 10737418240 4076120264 43277 236.62
consensus.db.10gb zskp 2 10737418240 3992225762 71119 143.98
consensus.db.10gb zstd 1 10737418240 4167622462 43778 233.90
consensus.db.10gb zstd 3 10737418240 3962744141 74509 137.43
file out level insize outsize millis mb/s
adresser.json zskp 1 7983034785 228519697 12536 607.26
adresser.json zskp 2 7983034785 253567362 15446 492.89
adresser.json zstd 1 7983034785 218077590 8297 917.59
adresser.json zstd 3 7983034785 239094031 10702 711.38
I am using this for testing (no warranty) https://gist.github.com/klauspost/d98da4c469ac30c47c3b0405d84e3828
I use it in a script like this:
go build compress.go
compress -in=%1 -out=* -stats -header=false -w="gzkp" -l=0
compress -in=%1 -out=* -stats -header=true -w="zskp" -l=1 >>results.txt
compress -in=%1 -out=* -stats -header=false -w="zskp" -l=2 >>results.txt
compress -in=%1 -out=* -stats -header=false -w="zstd" -l=1 >>results.txt
compress -in=%1 -out=* -stats -header=false -w="zstd" -l=3 >>results.txt
The first line is to load the file into cache.
Actually testing out with DataDog/zstd
, also yielded similar (lower) results:
file out level insize outsize millis mb/s
enwik9 zskp 1 1000000000 343833033 5972 159.69
enwik9 zskp 2 1000000000 317838745 8185 116.51
enwik9 zstd 1 1000000000 392429215 6403 148.94
enwik9 zstd 3 1000000000 368760828 7754 122.99
This seems to be down to this addition which removes the io.WriterTo
interface from the input.
So it seems that when the input doesn't supply io.WriterTo and io.Copy is used, the performance of the Datadog package takes a big hit. So in most cases this package performs similar to cgo. I will re-run benchmarks on your package and post it below.
Edit: It doesn't seem like your package gains the same speedup. Re-adding the interface doesn't provide the any speedup:
file out level insize outsize millis mb/s
enwik9 zskp 1 1000000000 343833033 6008 158.71
enwik9 zskp 2 1000000000 317838745 8242 115.71
enwik9 zstd 1 1000000000 358072021 5661 168.46
enwik9 zstd 3 1000000000 313734672 8533 111.76
file out level insize outsize millis mb/s
gob-stream zskp 1 1911399616 234981983 5329 342.06
gob-stream zskp 2 1911399616 208669457 6865 265.53
gob-stream zstd 1 1911399616 249810424 5014 363.48
gob-stream zstd 3 1911399616 208192146 6421 283.89
file out level insize outsize millis mb/s
consensus.db.10gb zskp 1 10737418240 4076120261 43303 236.47
consensus.db.10gb zskp 2 10737418240 3992225759 70478 145.29
consensus.db.10gb zstd 1 10737418240 4167622462 43689 234.38
consensus.db.10gb zstd 3 10737418240 3962744141 90530 113.11
Link to test files are here: https://github.com/klauspost/compress/pull/134
If you have a test-set (preferably a few GB) either as a single stream or as a collection of files I could test against your specific payload type. You can upload files here: https://goobox.io/#/upload
Here's another reason to not use cgo by default for builds - the playground won't work, since gcc isn't installed:
https://play.golang.org/p/AYmVJDSWqUV
# github.com/DataDog/zstd
exec: "gcc": executable file not found in $PATH
This is unfortunate, since the playground now has proper module support, so it could be used to share example programs with badger.