roaring-rs
roaring-rs copied to clipboard
Create bench comparison to CRoaring
i have done some test result of use roaring-rs and croaring-rs in datafusion doing count_distinct one col (use same logic: insert one value one time). Just FYI😊
1million_rows_10thousand_distinct.parquet
bitmap distinct
(roaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000 |
+---------------------------------+
1 row in set. Query took 0.052 seconds
(croaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000 |
+---------------------------------+
1 row in set. Query took 0.038 seconds.
1million_1million.parquet
roaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504 |
+---------------------------------+
1 row in set. Query took 0.175 seconds(roaring-rs).
croaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504 |
+---------------------------------+
1 row in set. Query took 0.052 seconds (croaring-rs).
i have done some test result of use roaring-rs and croaring-rs in datafusion doing count_distinct one col (use same logic: insert one value one time). Just FYI😊
1million_rows_10thousand_distinct.parquet
bitmap distinct (roaring-rs) +---------------------------------+ | BITMAPCOUNTDISTINCT(test.value) | +---------------------------------+ | 10000 | +---------------------------------+ 1 row in set. Query took 0.052 seconds (croaring-rs) +---------------------------------+ | BITMAPCOUNTDISTINCT(test.value) | +---------------------------------+ | 10000 | +---------------------------------+ 1 row in set. Query took 0.038 seconds. 1million_1million.parquet roaring-rs +---------------------------------+ | BITMAPCOUNTDISTINCT(test.value) | +---------------------------------+ | 631504 | +---------------------------------+ 1 row in set. Query took 0.175 seconds(roaring-rs). croaring-rs +---------------------------------+ | BITMAPCOUNTDISTINCT(test.value) | +---------------------------------+ | 631504 | +---------------------------------+ 1 row in set. Query took 0.052 seconds (croaring-rs).
Thanks for making this! Can I conclude that croaring-rs
is empirically faster?
i have done some test result of use roaring-rs and croaring-rs in datafusion doing count_distinct one col (use same logic: insert one value one time). Just FYI😊 1million_rows_10thousand_distinct.parquet
bitmap distinct (roaring-rs) +---------------------------------+ | BITMAPCOUNTDISTINCT(test.value) | +---------------------------------+ | 10000 | +---------------------------------+ 1 row in set. Query took 0.052 seconds (croaring-rs) +---------------------------------+ | BITMAPCOUNTDISTINCT(test.value) | +---------------------------------+ | 10000 | +---------------------------------+ 1 row in set. Query took 0.038 seconds. 1million_1million.parquet roaring-rs +---------------------------------+ | BITMAPCOUNTDISTINCT(test.value) | +---------------------------------+ | 631504 | +---------------------------------+ 1 row in set. Query took 0.175 seconds(roaring-rs). croaring-rs +---------------------------------+ | BITMAPCOUNTDISTINCT(test.value) | +---------------------------------+ | 631504 | +---------------------------------+ 1 row in set. Query took 0.052 seconds (croaring-rs).
Thanks for making this! Can I conclude that
croaring-rs
is empirically faster?
In my test case, yes. use ffi get better performance, but was last year version, maybe there will be huge improvement in rust version now!