roaring icon indicating copy to clipboard operation
roaring copied to clipboard

"error in roaringArray.readFrom: did not find expected serialCookie in header" when reading a bitmap written by roaring64

Open wjohnson-aurora opened this issue 1 year ago • 5 comments

We've occasionally seen the following error when using roaring64.Bitmap.ReadFrom to read data written by roaring64.Bitmap.WriteTo:

error in roaringArray.readFrom: did not find expected serialCookie in header

I was able to find random data that replicates the error. To replicate:

  1. Download the data that replicates the error (5.4 MB of random uint64s): roaring_error_items.txt
  2. Create main.go containing the following code:
package main

import (
	"bufio"
	"bytes"
	"os"
	"strconv"

	"github.com/RoaringBitmap/roaring/roaring64"
)

func main() {
	var items []uint64

	scanner := bufio.NewScanner(os.Stdin)
	for scanner.Scan() {
		line := scanner.Text()

		item, err := strconv.ParseUint(line, 10, 64)
		if err != nil {
			panic(err)
		}

		items = append(items, item)
	}
	if err := scanner.Err(); err != nil {
		panic(err)
	}

	bitmap := roaring64.NewBitmap()
	for _, item := range items {
		bitmap.Add(item)
	}

	var bitmapBuf bytes.Buffer
	if _, err := bitmap.WriteTo(&bitmapBuf); err != nil {
		panic(err)
	}

	readBitmap := roaring64.NewBitmap()
	if _, err := readBitmap.ReadFrom(&bitmapBuf); err != nil {
		panic(err)
	}
}
  1. Create go.mod with the following contents:
module roaring-replication

go 1.20

require github.com/RoaringBitmap/roaring v1.7.0

require (
	github.com/bits-and-blooms/bitset v1.12.0 // indirect
	github.com/mschoch/smat v0.2.0 // indirect
)
  1. Run go mod tidy
  2. Run with:
cat roaring_error_items.txt | go run main.go
  1. Observe error from ReadFrom:
panic: error in roaringArray.readFrom: did not find expected serialCookie in header

goroutine 1 [running]:
main.main()
	main.go:42 +0x27d

wjohnson-aurora avatar Jan 23 '24 16:01 wjohnson-aurora

Note: sorting the input file seems to make replication way faster. Also, here's a smaller test case (also found by @wjohnson-aurora)

roaring_error_items_2_sorted.txt

grantwwu avatar Jan 23 '24 19:01 grantwwu

The bug is absolutely real.

The issue is that when deserializing a 64-bit roaring bitmap, for some reason, the code first tries to deserialize a 32-bit version. I don't know why it is done, but in the instances you have created, it gets confused. It thinks it is dealing with a 32-bit bitmap, and then everything breaks after that.

(Of course, it is not, you serialize a 64-bit roaring bitmap.)

lemire avatar Jan 23 '24 20:01 lemire

Feel free to review my potential fix at https://github.com/RoaringBitmap/roaring/pull/410

Note that the data is not corrupted or any such thing. It is just that the code gets confused at the deserialization stage.

lemire avatar Jan 23 '24 20:01 lemire

In fact, a review would be much appreciated.

lemire avatar Jan 23 '24 20:01 lemire

Thank you for fixing this so quickly! I can confirm that commit 94aeb2b resolves this issue.

wjohnson-aurora avatar Jan 23 '24 20:01 wjohnson-aurora