polars icon indicating copy to clipboard operation
polars copied to clipboard

feat(rust,python): Faster bitpacking for Parquet writer

Open thalassemia opened this issue 9 months ago • 1 comments

I replaced the Parquet bit-packing algorithm with a modified version of the scalar algorithm from https://github.com/quickwit-oss/bitpacking (unsure where/how to give credit). I ensured the new algorithm works by adding unit tests that encode/decode random data.

I added a microbenchmark in my first commit to help measure the performance gain.

Before After
Avg. (ns/iter) 93.75 4.25
Std. (ns/iter) 1.57 0.04

I removed the microbenchmark in a later commit because it requires nightly Rust and is probably not useful outside of this PR.

Other miscellaneous fixes:

  • Packed array of 64 u64's should have a maximum length of 64 * 8 u8's (second commit)
  • Input array length should be compared to expected unpacked length instead of expected packed length to avoid unnecessary allocations/copies (third commit)

thalassemia avatar May 16 '24 18:05 thalassemia

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 81.34%. Comparing base (11fe9d8) to head (b8ff88d). Report is 18 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16278      +/-   ##
==========================================
+ Coverage   80.80%   81.34%   +0.53%     
==========================================
  Files        1393     1403      +10     
  Lines      179406   183257    +3851     
  Branches     2921     2922       +1     
==========================================
+ Hits       144971   149063    +4092     
+ Misses      33932    33691     -241     
  Partials      503      503              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 16 '24 19:05 codecov[bot]

Nice improvement! Thanks a lot @thalassemia. :raised_hands: Hope you find some on the reading side as well. ;)

ritchie46 avatar May 18 '24 16:05 ritchie46