polars
polars copied to clipboard
feat(rust,python): Faster bitpacking for Parquet writer
I replaced the Parquet bit-packing algorithm with a modified version of the scalar algorithm from https://github.com/quickwit-oss/bitpacking (unsure where/how to give credit). I ensured the new algorithm works by adding unit tests that encode/decode random data.
I added a microbenchmark in my first commit to help measure the performance gain.
Before | After | |
---|---|---|
Avg. (ns/iter) | 93.75 | 4.25 |
Std. (ns/iter) | 1.57 | 0.04 |
I removed the microbenchmark in a later commit because it requires nightly Rust and is probably not useful outside of this PR.
Other miscellaneous fixes:
- Packed array of 64 u64's should have a maximum length of 64 * 8 u8's (second commit)
- Input array length should be compared to expected unpacked length instead of expected packed length to avoid unnecessary allocations/copies (third commit)
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 81.34%. Comparing base (
11fe9d8
) to head (b8ff88d
). Report is 18 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #16278 +/- ##
==========================================
+ Coverage 80.80% 81.34% +0.53%
==========================================
Files 1393 1403 +10
Lines 179406 183257 +3851
Branches 2921 2922 +1
==========================================
+ Hits 144971 149063 +4092
+ Misses 33932 33691 -241
Partials 503 503
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Nice improvement! Thanks a lot @thalassemia. :raised_hands: Hope you find some on the reading side as well. ;)