roaring
roaring copied to clipboard
WriteTo() optimization ideas?
It appears the current implementation of Bitmap.WriteTo() isn't very much different than the user instead calling ToBytes() and invoking io.Writer.Write() itself... https://github.com/RoaringBitmap/roaring/blob/master/roaringarray.go#L526
Was wondering if there are approaches or ideas that might help avoid the "extra" toBytes() allocation?
I'd be willing to give an improvement an attempt if it's pretty straightforward. In my usage of roaring in a golang full-text engine (blevesearch), the io.Writer that we're using is one of those bufio buffered writers, so the guess would be that a bunch of Write()'s ought to be ok.
cheers, steve
The bulk of the time in the toBytes
function is going to be spent there:
stream := &bytes.Buffer{}
(...)
for i, c := range ra.containers {
_ = i
_, err := c.writeTo(stream)
if err != nil {
return nil, err
}
}
https://github.com/RoaringBitmap/roaring/blob/master/roaringarray.go#L512-L518
(Of course, one should do actual profiling... results could vary depending on your application.)
I don't think it is hard to pull out this code from the toBytes
function and do something more direct with it.
It is less obvious that it is a good path to speed things up. Buffering is not necessarily a negative performance wise. It can be an optimization too!
A better strategy might be to make it possible to reuse the same memory buffer several times. Or something.
Anyhow, writing the code is not hard (I expect you will agree). What is hard is to benchmark and tune.
A better strategy might be to make it possible to reuse the same memory buffer several times.
Indeed -- that's the pattern we've been following in blevesearch, and it's been quite a beneficial trick to play again and again.
What is hard is to benchmark and tune.
So true!!
I marked this "help wanted". I think that this is an accessible problem for contributions.