New index driver based on https://github.com/RoaringBitmap/roaring
Pilosa uses https://github.com/RoaringBitmap/roaring to implement bitmap index, but in our case pilosa comes with huge overhead (we've already got rid of server part).
Moreover, lot of syscalls in pilosa implementation caused some portability problems, e.g.: for mounted volumes in docker.
Last but not least, pilosa comes with long hierarchy of directories:
/index/field/view/fragment/storage,cache which has to be opened/closed/synced.
Maybe we can go down to the lower level and implement own bitmaps using https://github.com/RoaringBitmap/roaring
We don't use many pilosa features (which are mainly server oriented).
If we directly call roaring we can even get better performance, control parallel index creation and make all operations (And, Or, ...) also parallel (something what pilosa doesn't give us - roaring.ParAnd(nworkers, bmp1, bmp2))
@smola WDYT? I'm totally in to simplify our bitmap index implementation.
@ajnavarro I'm all for it, but let's get the priority of index improvements first. Do we have any size estimation of this task? (e.g. 1, 2, 4, 8 weeks?)
I would say 2 weeks. But because I always multiply by 1.4 (my error factor) I would say 2.8 ;)
Do we still want to do this given how little gitbase indexes have been used?
We can leave it here for someone that is interested to contribute.
Personally I would paraphrase the issue, because originally the idea was to get rid of all server leftovers in pilosa and directly use underlaying bitmaps implementation. But over the time, we don't benefit from fast merging feature (which bitmaps gives us) as much as we could. Moreover indexes mapping consume lot of space (and under the hood use boltdb b-trees). To recap, it could be even better if we replace bitmap indexes by b-trees (what tidb did).