Florian Jetter
Florian Jetter
FWIW I agree that this is not nice, I'm just not sure if I can offer a fix. I'm looking into one thing right now but can't promise much...
> Some other workflow languages have a dedicated operator for this kind of thing, but with only the @delayed decorator in Dask, you are right --- one must call dask.compute(*results)...
I also encountered this in https://github.com/dask/dask/pull/10722
This behavior can be controlled/disabled with the `dask.dataframe.convert-string` option, e.g. ```python import dask dask.config.set({"dask.dataframe.convert-string": False}) ``` disables this
+1 I think specializing on pyarrow will be the biggest benefit here that allows to cut complexity. Additionally, I would like us to review closely how we want to deal...
> " and if the current system isn't really solid/maintained/maintainable, then maybe the advice/best practice of using Parquet Thanks for raising this concern. I believe the API as is is...
In terms of prioritization I consider getting dask-expr into main dask the most important thing right now. There is still a lot of uncertainty so nobody is working on parquet...
@rjzamora can you please review this?
I'd like to point out that the test we're dealing with here is not very sensible. It uses test data with 9 data points so the result of the quantile...
I propose to ditch `dask.dataframe.tests.test_dataframe.py::test_quantile` and replace it with a test that iterates over a couple of different random distributions and compares it's results to numpy. If there are no...