roaring
roaring copied to clipboard
"error in roaringArray.readFrom: did not find expected serialCookie in header" when reading a bitmap written by roaring64
We've occasionally seen the following error when using roaring64.Bitmap.ReadFrom
to read data written by roaring64.Bitmap.WriteTo
:
error in roaringArray.readFrom: did not find expected serialCookie in header
I was able to find random data that replicates the error. To replicate:
- Download the data that replicates the error (5.4 MB of random
uint64
s): roaring_error_items.txt - Create
main.go
containing the following code:
package main
import (
"bufio"
"bytes"
"os"
"strconv"
"github.com/RoaringBitmap/roaring/roaring64"
)
func main() {
var items []uint64
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
line := scanner.Text()
item, err := strconv.ParseUint(line, 10, 64)
if err != nil {
panic(err)
}
items = append(items, item)
}
if err := scanner.Err(); err != nil {
panic(err)
}
bitmap := roaring64.NewBitmap()
for _, item := range items {
bitmap.Add(item)
}
var bitmapBuf bytes.Buffer
if _, err := bitmap.WriteTo(&bitmapBuf); err != nil {
panic(err)
}
readBitmap := roaring64.NewBitmap()
if _, err := readBitmap.ReadFrom(&bitmapBuf); err != nil {
panic(err)
}
}
- Create
go.mod
with the following contents:
module roaring-replication
go 1.20
require github.com/RoaringBitmap/roaring v1.7.0
require (
github.com/bits-and-blooms/bitset v1.12.0 // indirect
github.com/mschoch/smat v0.2.0 // indirect
)
- Run
go mod tidy
- Run with:
cat roaring_error_items.txt | go run main.go
- Observe error from
ReadFrom
:
panic: error in roaringArray.readFrom: did not find expected serialCookie in header
goroutine 1 [running]:
main.main()
main.go:42 +0x27d
Note: sorting the input file seems to make replication way faster. Also, here's a smaller test case (also found by @wjohnson-aurora)
The bug is absolutely real.
The issue is that when deserializing a 64-bit roaring bitmap, for some reason, the code first tries to deserialize a 32-bit version. I don't know why it is done, but in the instances you have created, it gets confused. It thinks it is dealing with a 32-bit bitmap, and then everything breaks after that.
(Of course, it is not, you serialize a 64-bit roaring bitmap.)
Feel free to review my potential fix at https://github.com/RoaringBitmap/roaring/pull/410
Note that the data is not corrupted or any such thing. It is just that the code gets confused at the deserialization stage.
In fact, a review would be much appreciated.
Thank you for fixing this so quickly! I can confirm that commit 94aeb2b
resolves this issue.