Florian Jetter
Florian Jetter
> I found that x.copy() ran in 2 GB/s and pq.read_table(io.BytesIO(bytes)) ran in 180 MB/s. I'm not sure if this comparison is actually fair and valid. Parquet -> Arrow has...
Just a heads up. My current working theory is that the parquet deserialization performance is roughly where it is supposed to be (but honestly I don't know) but what we're...
we're not using nightlies
I assume that the dask/distributed use of defaults is coincidental. I wouldn't expect problems switching to nodefaults.
You should be able to define your module in cloudpickle to be pickled by value to force dask to upload this https://github.com/cloudpipe/cloudpickle?tab=readme-ov-file#overriding-pickles-serialization-mechanism-for-importable-constructs
This is not a shortcoming of the plugin system. You are faced with the exact same problem if you are submitting functions as ordinary tasks so this is a problem...
If your parquet files are 150MB on disk, chances are that they are easily 1GB in memory if not more and there are two threads per worker running loading this...
add to allowlist
Apologies for the CI failures. Our CI is indeed haunted by flaky tests. Changes proposed LGTM but maybe we'll wait for a bit to get a review on the CPython...
> If you care about keeping the tests running on ancient openssl, how ancient? (I think if CPython doesn't test more, I'm good with this) > Hmm, reviewing the PR,...