Adrian Ehrsam
Adrian Ehrsam
> Let's get the other one in first I hope it's ok to prioritize this one from my side to not have to keep both branches up-to-date
How about `parallelize:bool|int` on python side? 🙂
I finally had the time to update this branch with the new parallel parameter in python. Hope it's looking good now!
I only did some manual test on my own data, but could probably write some benchmark in python, using duckdb or polars as source. Would it make sense to add...
I did some very basic benchmarking, but the results were not as I hoped :) While RAM consumption is significantly lower, the speed is not good enough yet. I think...
Pretty sure the non-async write causes issues. But object_store 0.10 will change a lot there, so maybe better to wait for that
I guess we also have to wait for a release of the arrow crate
Also renaming directories (kind of move with recurse=True) is a case that's currently not possible on an Datalake v2 Account
I can pick it up, but I'd rather do it on the write.rs operation
Ok, I see partioning makes this quite complicated 🙂 And MemoryExec of DataFusion is not helpful, so might take some time