vortex icon indicating copy to clipboard operation
vortex copied to clipboard

feat: basic BoolBuffer / BoolBufferMut

Open a10y opened this issue 10 months ago • 6 comments

It's very annoying to do random in-place setting of bits in Arrow BooleanBufferBuilder. It's very append focused. For something like SparseArray in particular, you want to create a BoolArray against uninitialized memory and only set the bits corresponding to positions of non-null values.

a10y avatar Feb 21 '25 15:02 a10y

Can we call this BitBuffer?

gatesn avatar Feb 21 '25 15:02 gatesn

Sadly new_uninit leads to UB when you try and do inplace updates on it. I can't think of a great way to work around that without doing something like having a second bitset, which sort of defeats the point of avoiding the initialization cost anyway, so for now I'm just removing it.

a10y avatar Feb 25 '25 02:02 a10y

Where are we at with this? Need someone to takeover while you poke at FFI bindings?

gatesn avatar Mar 01 '25 09:03 gatesn

Maybe? I'm no longer sure there's a strong path forward that gives us something better than Arrow's BooleanBufferBuilder.

The big issue is that mapping a bitset on top of uninit memory causes UB when you try and set bits in-place. While it generally seems harmless it is technically a violation of Rust's memory model so not guaranteed to work across platforms.

a10y avatar Mar 02 '25 20:03 a10y

Why does the memory have to be uninitialized? Even the ability to into_mut and update bits is valuable from an API perspective.

gatesn avatar Mar 02 '25 22:03 gatesn

I will take over this pr

robert3005 avatar Mar 20 '25 17:03 robert3005

This now compiles but there's still a lot of tests to fix

robert3005 avatar Aug 14 '25 13:08 robert3005

CodSpeed Performance Report

Merging #2456 will degrade performances by 62.42%

Comparing aduffy/bool-buffer (d93320a) with develop (d37444a)

Summary

⚡ 15 improvements
❌ 140 regressions
✅ 1127 untouched

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
decompress_rd[f32, 100000] 942.6 µs 854.6 µs +10.29%
decompress_rd[f64, 100000] 1.2 ms 1.3 ms -10.16%
decompress_bitpacking_early_filter[i16, 0.02] 129 µs 149.3 µs -13.55%
decompress_bitpacking_early_filter[i32, 0.02] 138.3 µs 159.2 µs -13.1%
decompress_bitpacking_early_filter[i32, 0.03] 157 µs 187.1 µs -16.09%
decompress_bitpacking_early_filter[i32, 0.04] 174.6 µs 214.2 µs -18.49%
decompress_bitpacking_early_filter[i32, 0.05] 192.5 µs 242.1 µs -20.48%
decompress_bitpacking_early_filter[i64, 0.005] 77.4 µs 86.2 µs -10.12%
decompress_bitpacking_early_filter[i64, 0.02] 158.3 µs 179.9 µs -12.04%
decompress_bitpacking_early_filter[i64, 0.03] 178.8 µs 209.8 µs -14.77%
decompress_bitpacking_early_filter[i64, 0.04] 198.6 µs 239.1 µs -16.93%
decompress_bitpacking_early_filter[i64, 0.05] 218.7 µs 269.2 µs -18.73%
decompress_bitpacking_early_filter[i8, 0.02] 126.2 µs 146.2 µs -13.64%
decompress_bitpacking_early_filter[i8, 0.03] 180.7 µs 209.9 µs -13.9%
decompress_bitpacking_early_filter[i8, 0.04] 193.3 µs 231.9 µs -16.66%
decompress_bitpacking_early_filter[i8, 0.05] 205.7 µs 254.4 µs -19.14%
decompress_bitpacking_late_filter[i16, 0.03] 225.4 µs 254 µs -11.26%
decompress_bitpacking_late_filter[i16, 0.04] 238.6 µs 276.8 µs -13.78%
decompress_bitpacking_late_filter[i16, 0.05] 251.8 µs 299.9 µs -16.04%
decompress_bitpacking_late_filter[i32, 0.0105] 296.4 µs 221 µs +34.13%
... ... ... ... ...

:information_source: Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

codspeed-hq[bot] avatar Aug 14 '25 23:08 codspeed-hq[bot]

I need to figure out all the slow downs - seems like arrows append_packed_range is faster than bitvec implementaion and forcing iterators to be owned adds a bunch of drops that could be avoided

robert3005 avatar Aug 15 '25 01:08 robert3005

Coverage Status

coverage: 87.875% (+0.2%) from 87.72% when pulling 2d540e578180288f0ce9c84663d71c59f46e0d83 on aduffy/bool-buffer into aaf3e36ad46a8e3269b1c7f7b8a86e5d3fef4025 on develop.

coveralls avatar Aug 15 '25 01:08 coveralls

Codecov Report

:x: Patch coverage is 89.21907% with 156 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 88.05%. Comparing base (87aa18b) to head (5a48fba).

Files with missing lines Patch % Lines
vortex-buffer/src/bit/buf_mut.rs 82.14% 35 Missing :warning:
vortex-buffer/src/bit/unaligned.rs 87.43% 26 Missing :warning:
vortex-buffer/src/bit/buf.rs 89.78% 24 Missing :warning:
vortex-buffer/src/bit/ops.rs 81.48% 15 Missing :warning:
encodings/runend/src/compress.rs 61.53% 10 Missing :warning:
vortex-buffer/src/buffer.rs 73.52% 9 Missing :warning:
vortex-buffer/src/bit/aligned.rs 88.52% 7 Missing :warning:
vortex-array/src/arrays/bool/compute/is_sorted.rs 16.66% 5 Missing :warning:
vortex-array/src/validity.rs 84.61% 4 Missing :warning:
vortex-array/src/pipeline/canonical.rs 40.00% 3 Missing :warning:
... and 15 more

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Aug 26 '25 11:08 codecov[bot]

Will make a new pr

robert3005 avatar Oct 14 '25 13:10 robert3005