Daniël Heres
Daniël Heres
Hm but this seems to be currently enabled, are you somehow running a version without `ipc_compression` set?
Is the branch compiling already? Seems it might not and you might be running an older (cached) version maybe? https://github.com/andygrove/arrow-ballista/actions/runs/7169227281/job/19519719152#step:8:443
> > Is the branch compiling already? > > Yes, it was just the tests that weren't compiling in CI. I pushed a fix. I ran a cargo clean locally...
It might also be better to remove files after the next stage finishes instead of waiting on job to finish? Should help with disk consumption for very large jobs.
Sounds like a great idea to me
I read through it, sounds indeed a bit more simple. A nice side effect btw of this optimization is that limit on the shufflewriter is also a bit more effective...
We can not control the file writer yet - this depends on https://github.com/apache/arrow-datafusion/issues/4708
Maybe you can share the doc publicly so anyone can do suggestions?
> @Dandandan fyi took a first stab at group by q8. > > ``` > q1 took 62 ms > q2 took 322 ms > q3 took 1230 ms >...
@jangorecki what would be needed to get rust native benchmarks in here?