Florian Jetter

Results 237 comments of Florian Jetter

FWIW I agree that this is not nice, I'm just not sure if I can offer a fix. I'm looking into one thing right now but can't promise much...

> Some other workflow languages have a dedicated operator for this kind of thing, but with only the @delayed decorator in Dask, you are right --- one must call dask.compute(*results)...

I also encountered this in https://github.com/dask/dask/pull/10722

This behavior can be controlled/disabled with the `dask.dataframe.convert-string` option, e.g. ```python import dask dask.config.set({"dask.dataframe.convert-string": False}) ``` disables this

+1 I think specializing on pyarrow will be the biggest benefit here that allows to cut complexity. Additionally, I would like us to review closely how we want to deal...

> " and if the current system isn't really solid/maintained/maintainable, then maybe the advice/best practice of using Parquet Thanks for raising this concern. I believe the API as is is...

In terms of prioritization I consider getting dask-expr into main dask the most important thing right now. There is still a lot of uncertainty so nobody is working on parquet...

I'd like to point out that the test we're dealing with here is not very sensible. It uses test data with 9 data points so the result of the quantile...

I propose to ditch `dask.dataframe.tests.test_dataframe.py::test_quantile` and replace it with a test that iterates over a couple of different random distributions and compares it's results to numpy. If there are no...