Andrew Duffy
Andrew Duffy
I'm currently blocking this on some work in https://github.com/spiraldb/fsst/pull/24
 Currently 59% of `fsst_compress` time is spent actually compressing, we break out of the fast loop to do push_null and data copying. Something to improve on in flup
Initial TPC-H benchmarks comparison: ``` aduffy@DuffyProBook ~/c/vortex (aduffy/utf8view) [1]> critcmp develop-vortex utf8view-vortex group develop-vortex utf8view-vortex ----- -------------- --------------- tpch_q1/vortex-pushdown-disabled 1.00 339.2±0.83ms ? ?/sec 1.24 421.0±1.13ms ? ?/sec tpch_q10/vortex-pushdown-disabled 1.00 165.3±5.40ms...
Oh right, I forgot that into_canonical currently copies the world 🤦  should be an easy fix
Alright, a bit warmer now: ``` $ critcmp develop-vortex utf8view-vortex-fixed-faster group develop-vortex utf8view-vortex-fixed-faster ----- -------------- ---------------------------- tpch_q1/vortex-pushdown-disabled 1.00 339.2±0.83ms ? ?/sec 1.13 382.8±3.05ms ? ?/sec tpch_q10/vortex-pushdown-disabled 1.00 165.3±5.40ms ? ?/sec...
PR upstream for better take kernel: https://github.com/apache/arrow-rs/pull/6168
https://github.com/apache/arrow-rs/pull/6171
CI won't succeed until this is merged: * https://github.com/apache/arrow-rs/pull/6171 Benches will continue to be slower than regular Utf8 until this merges: * https://github.com/apache/arrow-rs/pull/6168 We may also want to consider how...
Alright, the above 2 PRs have merged into arrow-rs, which means we now need to wait for them to make their way into DataFusion to get the pytests passing. Looks...
Superceded by #757