Jonas Haag
Jonas Haag
Hm, that didn't work either. I have version that works for me here: https://github.com/jonashaag/macvim/tree/yosemite I also set up automatic builds that you can download here: https://github.com/jonashaag/macvim/releases Maybe I'll also start...
Regex for PUT endpoint should be fixed as well!
Weird... ``` In [27]: %timeit -n1 -r3 q.sink_parquet("/tmp/t") 1.8 s ± 13.5 ms per loop (mean ± std. dev. of 3 runs, 1 loop each) In [29]: %timeit -n1 -r3...
2 core GitHub codespace: ``` In [1]: import polars as pl In [2]: q = pl.scan_parquet("pl-perf-medium.parquet") In [3]: %timeit -n1 -r3 q.sink_parquet("/tmp/t") 3.59 s ± 113 ms per loop (mean...
This is on the production machine, it's a larger file with 2x the rows
Similar behavior with a different Parquet and this code: ``` pl.scan_parquet(...).filter(...).sink_parquet(...) ``` Uses only 2 cores. ``` pl.scan_parquet(...).filter(...).collect() ``` Uses all cores. But a `df.write_parquet(...)` seems bottlenecked to 1 core....
@itamarst thank you for looking into this! Were you able to reproduce the fact that the `sink_parquet()` route uses fewer cores? I think that explains most of the performance difference
@itamarst ``` In [10]: %timeit -n1 -r1 pl.scan_parquet("tmp/pl-perf-medium.parquet").sink_parquet("/tmp/xx", compression="uncompressed") 2.29 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each) In [11]: %timeit -n1...
That would explain the very sparse-looking thread utilization graphs for the `sink_parquet()` variant