roaring-rs icon indicating copy to clipboard operation
roaring-rs copied to clipboard

Create bench comparison to CRoaring

Open saik0 opened this issue 2 years ago • 3 comments

saik0 avatar Feb 11 '22 12:02 saik0

i have done some test result of use roaring-rs and croaring-rs in datafusion doing count_distinct one col (use same logic: insert one value one time). Just FYI😊

1million_rows_10thousand_distinct.parquet

bitmap distinct 

(roaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.052 seconds

(croaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.038 seconds.

1million_1million.parquet

roaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.175 seconds(roaring-rs).

croaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.052 seconds (croaring-rs).

Ted-Jiang avatar Feb 22 '22 03:02 Ted-Jiang

i have done some test result of use roaring-rs and croaring-rs in datafusion doing count_distinct one col (use same logic: insert one value one time). Just FYI😊

1million_rows_10thousand_distinct.parquet

bitmap distinct 

(roaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.052 seconds

(croaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.038 seconds.

1million_1million.parquet

roaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.175 seconds(roaring-rs).

croaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.052 seconds (croaring-rs).

Thanks for making this! Can I conclude that croaring-rs is empirically faster?

tonyabracadabra avatar Jun 27 '22 05:06 tonyabracadabra

i have done some test result of use roaring-rs and croaring-rs in datafusion doing count_distinct one col (use same logic: insert one value one time). Just FYI😊 1million_rows_10thousand_distinct.parquet

bitmap distinct 

(roaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.052 seconds

(croaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.038 seconds.

1million_1million.parquet

roaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.175 seconds(roaring-rs).

croaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.052 seconds (croaring-rs).

Thanks for making this! Can I conclude that croaring-rs is empirically faster?

In my test case, yes. use ffi get better performance, but was last year version, maybe there will be huge improvement in rust version now!

Ted-Jiang avatar Jun 27 '22 07:06 Ted-Jiang