compress
compress copied to clipboard
Incorrect decompression with WithDecoderConcurrency(1) or DecodeAll
I'm using your zstd library in my squashfs library and have run into a weird edge case. It seems that sometime when I decompress some data, I'm not getting the correct output. The weird thing is that the issue only arises when I pass WithDecoderConcurrency(1)
or use DecodeAll
. If I use WithDecoderConcurrency
with any number other than 1 (and don't use DecodeAll
), the problem goes away.
For some context, I'm having the issue particularly when trying to read squashfs metadata blocks, which are pretty small (they have a max size of 8KiB, and not all of them are having issues. When I have some time, I'll try to capture some of the blocks that are causing the issue.
Also, fantastic work on this library.
Could you share the code used to decompress and/or a sample input?
The important code is essentially
decompress/interface.go
type Decompressor interface {
Reader(src io.Reader) (io.ReadCloser, error)
}
squashfs archives can use six different types of compression so I need an interface I can use.
decompress/zstd.go
type Zstd struct{}
func (z Zstd) Reader(src io.Reader) (io.ReadCloser, error) {
r, err := zstd.NewReader(src)
return r.IOReadCloser(), err
}
metadata/reader.go
type Reader struct {
master io.Reader
cur io.Reader
d decompress.Decompressor
}
func NewReader(master io.Reader, d decompress.Decompressor) *Reader {
return &Reader{
master: master,
d: d,
}
}
func realSize(siz uint16) uint16 {
return siz &^ 0x8000
}
func (r *Reader) advance() (err error) {
if clr, ok := r.cur.(io.Closer); ok {
clr.Close()
}
var raw uint16
err = binary.Read(r.master, binary.LittleEndian, &raw)
if err != nil {
return
}
size := realSize(raw)
r.cur = io.LimitReader(r.master, int64(size))
if size == raw {
r.cur, err = r.d.Reader(r.cur)
}
return
}
func (r *Reader) Read(p []byte) (n int, err error) {
if r.cur == nil {
err = r.advance()
if err != nil {
return
}
}
n, err = r.cur.Read(p)
if err == io.EOF {
err = r.advance()
if err != nil {
return
}
var tmpN int
tmp := make([]byte, len(p)-n)
tmpN, err = r.Read(tmp)
for i := 0; i < tmpN; i++ {
p[n+i] = tmp[i]
}
n += tmpN
}
return
}
I am currently running into some issues with performance, I'm guessing due to creating a new Decoder every time I need to decompress a block. I ran into the issue when I tried switching it to:
zstd.go
type Zstd struct {
rdr *zstd.Decoder
}
func (z *Zstd) Reader(src io.Reader) (io.ReadCloser, error) {
if z.rdr == nil {
z.rdr, _ = zstd.NewReader(nil)
}
data, err := io.ReadAll(src)
if err != nil {
return nil, err
}
out, err := z.rdr.DecodeAll(data, nil)
return io.NopCloser(bytes.NewReader(out)), err
}
I'm working on capturing the problem data, but it's a bit of a pain to capture just a few blocks and I'm a bit busy today.
I'm looking into it a bit more, but I'm having trouble figuring out what's actually going on. I had my library dump out each block and the result with DecodeAll and using the Decoder directly without any options and... no difference. I used meld (which compares file contents) and there was no difference. Very weird.
@CalebQ42 Thanks a lot for the detailed info. Sounds a lot like some sort of data race. I will take a look and see what could be the possible reason for it.
I have been completely unable to reproduce.
Please see if you can create a clean reproducer for me.