Andrew Duffy

Results 52 comments of Andrew Duffy

I'm currently blocking this on some work in https://github.com/spiraldb/fsst/pull/24

![image](https://github.com/user-attachments/assets/c2419307-72b5-4aa5-a87d-98fa8df6f6bf) Currently 59% of `fsst_compress` time is spent actually compressing, we break out of the fast loop to do push_null and data copying. Something to improve on in flup

Initial TPC-H benchmarks comparison: ``` aduffy@DuffyProBook ~/c/vortex (aduffy/utf8view) [1]> critcmp develop-vortex utf8view-vortex group develop-vortex utf8view-vortex ----- -------------- --------------- tpch_q1/vortex-pushdown-disabled 1.00 339.2±0.83ms ? ?/sec 1.24 421.0±1.13ms ? ?/sec tpch_q10/vortex-pushdown-disabled 1.00 165.3±5.40ms...

Oh right, I forgot that into_canonical currently copies the world 🤦 ![image](https://github.com/user-attachments/assets/14227c86-58c5-42d9-be90-b24f5a02cd68) should be an easy fix

Alright, a bit warmer now: ``` $ critcmp develop-vortex utf8view-vortex-fixed-faster group develop-vortex utf8view-vortex-fixed-faster ----- -------------- ---------------------------- tpch_q1/vortex-pushdown-disabled 1.00 339.2±0.83ms ? ?/sec 1.13 382.8±3.05ms ? ?/sec tpch_q10/vortex-pushdown-disabled 1.00 165.3±5.40ms ? ?/sec...

PR upstream for better take kernel: https://github.com/apache/arrow-rs/pull/6168

https://github.com/apache/arrow-rs/pull/6171

CI won't succeed until this is merged: * https://github.com/apache/arrow-rs/pull/6171 Benches will continue to be slower than regular Utf8 until this merges: * https://github.com/apache/arrow-rs/pull/6168 We may also want to consider how...

Alright, the above 2 PRs have merged into arrow-rs, which means we now need to wait for them to make their way into DataFusion to get the pytests passing. Looks...