go-mysql-server icon indicating copy to clipboard operation
go-mysql-server copied to clipboard

New index driver based on https://github.com/RoaringBitmap/roaring

Open kuba-- opened this issue 7 years ago • 6 comments

Pilosa uses https://github.com/RoaringBitmap/roaring to implement bitmap index, but in our case pilosa comes with huge overhead (we've already got rid of server part). Moreover, lot of syscalls in pilosa implementation caused some portability problems, e.g.: for mounted volumes in docker. Last but not least, pilosa comes with long hierarchy of directories: /index/field/view/fragment/storage,cache which has to be opened/closed/synced. Maybe we can go down to the lower level and implement own bitmaps using https://github.com/RoaringBitmap/roaring We don't use many pilosa features (which are mainly server oriented). If we directly call roaring we can even get better performance, control parallel index creation and make all operations (And, Or, ...) also parallel (something what pilosa doesn't give us - roaring.ParAnd(nworkers, bmp1, bmp2))

kuba-- avatar Mar 26 '19 00:03 kuba--

@smola WDYT? I'm totally in to simplify our bitmap index implementation.

ajnavarro avatar Apr 02 '19 12:04 ajnavarro

@ajnavarro I'm all for it, but let's get the priority of index improvements first. Do we have any size estimation of this task? (e.g. 1, 2, 4, 8 weeks?)

smola avatar Apr 03 '19 12:04 smola

I would say 2 weeks. But because I always multiply by 1.4 (my error factor) I would say 2.8 ;)

kuba-- avatar Apr 03 '19 14:04 kuba--

Do we still want to do this given how little gitbase indexes have been used?

erizocosmico avatar Oct 09 '19 08:10 erizocosmico

We can leave it here for someone that is interested to contribute.

ajnavarro avatar Oct 09 '19 09:10 ajnavarro

Personally I would paraphrase the issue, because originally the idea was to get rid of all server leftovers in pilosa and directly use underlaying bitmaps implementation. But over the time, we don't benefit from fast merging feature (which bitmaps gives us) as much as we could. Moreover indexes mapping consume lot of space (and under the hood use boltdb b-trees). To recap, it could be even better if we replace bitmap indexes by b-trees (what tidb did).

kuba-- avatar Oct 09 '19 10:10 kuba--