roaring-rs
roaring-rs copied to clipboard
Feature Request: `RoaringBitmap::from_lsb0_bytes`
The Feature Explanation
- Adding new creation function for
RoaringBitmap
pub fn from_lsb0_bytes(offset: u32, bytes: &[u8]) -> RoaringBitmap
Function Behavior
- Interpret bytes as little endian bytes bitmap, and construct
RoaringBitmap offsetcan be used to offset the passing bitmap's index- If
offsetis not aligned to # of bits ofContainer'sStore::Bitmap(# of bits ofBox<[u64; 1024]>), this function panics
use roaring::RoaringBitmap;
let bytes = [0b00000101, 0b00000010, 0b00000000, 0b10000000];
// ^^^^^^^^ ^^^^^^^^ ^^^^^^^^ ^^^^^^^^
// 76543210 98
let rb = RoaringBitmap::from_lsb0_bytes(0, &bytes);
assert!(rb.contains(0));
assert!(!rb.contains(1));
assert!(rb.contains(2));
assert!(rb.contains(9));
assert!(rb.contains(31));
let rb = RoaringBitmap::from_lsb0_bytes(8, &bytes);
assert!(rb.contains(8));
assert!(!rb.contains(9));
assert!(rb.contains(10));
assert!(rb.contains(17));
assert!(rb.contains(39));
Motivation
Sometimes bitmap is calculated by SIMD instructions. The result of SIMD instruction is likely to be already bitmask, not the series of bitmap indicies.
Under current implementation, when you intend use RoaringBitmap with bitmask produced by SIMD instruction, you have to use RoaringBitmap::sorted_iter or just insert one by one.
To solve this problem, I implemented RoaringBitmap::from_bitmap_bytes, which can be used to construct directly from bitmask.
Example of Production of Bitmask by SIMD instructions
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=40ddd13554c171be31fe53893401d40f
use std::arch::x86_64::*;
#[target_feature(enable = "avx2")]
unsafe fn compare_u8_avx2(a: &[u8], b: &[u8]) -> u32 {
assert!(
a.len() == 32 && b.len() == 32,
"Inputs must have a length of 32."
);
// Load the data into 256-bit AVX2 registers
let a_vec = _mm256_loadu_si256(a.as_ptr() as *const __m256i);
let b_vec = _mm256_loadu_si256(b.as_ptr() as *const __m256i);
// Perform comparison (a == b)
let cmp_result = _mm256_cmpeq_epi8(a_vec, b_vec);
// Extract the comparison result as a bitmask
let mask = _mm256_movemask_epi8(cmp_result);
mask as u32
}
fn main() {
let a: [u8; 32] = [
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32,
];
let b: [u8; 32] = [
1, 0, 3, 4, 5, 6, 7, 8, 0, 10, 11, 12, 13, 0, 15, 16, 0, 18, 19, 20, 21, 0, 23, 24, 0, 26,
27, 0, 29, 0, 31, 32,
];
let mask = unsafe {
compare_u8_avx2(&a, &b)
};
println!("Bitmask: {:#034b}", mask);
// Bitmask: 0b11010110110111101101111011111101
print!("Bitmask (little endian u8): ");
for b in mask.to_le_bytes() {
print!("{:08b} ", b);
}
println!();
// Bitmask (little endian u8): 11111101 11011110 11011110 11010110
let n = 2;
println!("Bitmask at {n}: {}", mask & (1 << n) != 0);
// Bitmask at 2: true
}
Benchmark Result
On my laptop (Apple M3 MacBook Air Sonoma14.3 Memory 16 GB), in most cases from_lsb0_bytes is much faster than from_sorted_iter.
Part of Results
creation/from_bitmap_bytes/census-income_srt
time: [984.25 µs 987.00 µs 990.37 µs]
thrpt: [6.1521 Gelem/s 6.1731 Gelem/s 6.1904 Gelem/s]
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
creation/from_sorted_iter/census-income_srt
time: [23.383 ms 23.397 ms 23.413 ms]
thrpt: [260.24 Melem/s 260.41 Melem/s 260.57 Melem/s]
Hi @Kerollmops 👋
Thank you for your quick reply. I have just made changes based on your review.
Basically I did:
- Moved
RoaringBitmap::from_bitmap_bytes, and its tests toserializationmodule. - Added detailed decument to the
RoaringBitmap::from_bitmap_bytes, includingoffset,bytesexplanations. - Relaxed the alignment requirement for
offset.- Thanks to
#[inline], I belive the compiler can easily optimize ifoffsetis actually aligned to 8, or even 64Ki
- Thanks to
@Dr-Emann I've just fixed the documentation and made the implementation endian-aware.
You can try big endian system by running cargo +nightly miri test --target s390x-unknown-linux-gnu --package roaring --lib -- bitmap::serialization::test::test_from_bitmap_bytes.
I also added this big endian test to CI.
I have just merged patch from @Dr-Emann (Thank you so much.) If merge commit https://github.com/RoaringBitmap/roaring-rs/pull/288/commits/aace6b87ae9cb3fdcb930b41b82b707c4e21d18f is unnecessary, I'll remove it by force push.
I applied cargo fmt, and rename from_bitmap_bytes to from_lsb0_bytes.
@Kerollmops I would like you to review this pull requests, then I can see the CI result.
Hey @lemolatoon 👋
Sorry for the very late reply. I just approved your changes so that you can see the CI.
Thank you @Kerollmops !
CI seems to be failing because of the change of cargo clippy on the nightly channel. I think this should be fixed on another pull request.
@lemolatoon The CI errors should be fixed by pulling in main, thanks to #293.
I git rebaseed to the main branch. It seems to be required @Kerollmops approval to re-run the CI :pray:
It seems to have the clippy warnings appeared only in 1.66.0 not in stable. I fixed the warnings. Could you re-run CI again please? @Kerollmops
Hey @lemolatoon 👋 Would you mind rebasing on main, please? I just merged #295. Have a nice one 🐿️
@Kerollmops Sorry for the late fix. I have just pushed with two changes:
- git rebased upstream/main
- Fix the CI error regarding to the miri
I am afraid to bother you multiple times, but I'd like to see the CI result again :pray:
I accidentally skipped one commit fixing the cargo clippy warning by the last rebase. I fixed the warning again and pushed.
I added the implementation of the work around for non-8-multiple offset for RoaringBitmap::from_lsb0_bytes. I can technically avoid an additional allocation for this case, but in this time I implemented it simply because it would be complicated if I didn't so.
Thank you for merging, releasing a new version, and the advice for this pull request!