roaring icon indicating copy to clipboard operation
roaring copied to clipboard

add a new compare func CompareValueONeil for bsi, which faster than CompareValue

Open ChicChip opened this issue 1 year ago • 8 comments

According to Java's compare function , I code a new function CompareValueONeil for go version ,and it only support number which is >= 0 , it is faster than CompareValue. When bsi include 200 million columnIds, using same range query,CompareValueONeil is cost about 1s when CompareValue cost about 12s(https://github.com/RoaringBitmap/RoaringBitmap/blob/cca90c986d5c0096bbeabb5f968833bf12c28c0e/bsi/src/main/java/org/roaringbitmap/bsi/longlong/Roaring64BitmapSliceIndex.java#L456)

ChicChip avatar Nov 26 '24 02:11 ChicChip

Test code read my file "total.bin" file, _ := os.OpenFile(path+"/total.bin", os.O_RDWR, 0755) defer file.Close() total := roaring64.NewDefaultBSI() total.ReadFrom(file) t := time.Now().UnixNano() res := total.CompareValue(0, roaring64.RANGE, 25, 30, nil) fmt.Printf("totalCount[%v], count[%v], CompareValue cost %v [ms]\n", total.GetCardinality(), res.GetCardinality(), (time.Now().UnixNano()-t)/1000000) t1 := time.Now().UnixNano() res = total.CompareValueONeil(roaring64.RANGE, 25, 30, nil) fmt.Printf("totalCount[%v], count[%v], CompareValueONeil cost %v [ms]\n", total.GetCardinality(), res.GetCardinality(), (time.Now().UnixNano()-t1)/1000000)

result totalCount[207701480], count[234], CompareValue cost 14051 [ms] totalCount[207701480], count[234], CompareValueONeil cost 882 [ms]

ChicChip avatar Nov 26 '24 03:11 ChicChip

I will run some tests in my environment. In the meantime, could we come up with a better more descriptive name for this function?

guymolinari avatar Nov 26 '24 14:11 guymolinari

Also (and better yet), when constructing the BSI we could potentially set min/max value providing some context about the nature of the data being stored in the BSI. Then we could choose to apply the optimization without creating an entirely new API.

guymolinari avatar Nov 26 '24 14:11 guymolinari

I will run some tests in my environment. In the meantime, could we come up with a better more descriptive name for this function?

yes, more descriptive name is better

ChicChip avatar Nov 27 '24 01:11 ChicChip

Also (and better yet), when constructing the BSI we could potentially set min/max value providing some context about the nature of the data being stored in the BSI. Then we could choose to apply the optimization without creating an entirely new API.

Yes, util now, BSI's min/max doesn't be refreshed when setValues or deserialization. If min/max exist in BSI, new API can cancel.

ChicChip avatar Nov 27 '24 01:11 ChicChip

Is there some news on this PR?

lemire avatar Jul 10 '25 12:07 lemire

I'm also interested in the progress of this.

Are you still working on this @ChicChip?

ofpiyush avatar Aug 07 '25 06:08 ofpiyush

I'm also interested in the progress of this.

Are you still working on this @ChicChip?

bsi32 and bsi64 are supported in commit, but this request(https://github.com/RoaringBitmap/roaring/pull/461#issuecomment-2500974153) about set min/max i haven't done yet.

ChicChip avatar Aug 07 '25 06:08 ChicChip