Florian Jetter comments

Results 237 comments of


                                            Florian Jetter

Breaking of concurrency when calling `dask.delayed` inside a `@dask.delayed`-decorated function

FWIW I agree that this is not nice, I'm just not sure if I can offer a fix. I'm looking into one thing right now but can't promise much...

Breaking of concurrency when calling `dask.delayed` inside a `@dask.delayed`-decorated function

> Some other workflow languages have a dedicated operator for this kind of thing, but with only the @delayed decorator in Dask, you are right --- one must call dask.compute(*results)...

`test_split_adaptive_aggregate_files` failing on main

I also encountered this in https://github.com/dask/dask/pull/10722

DataFrame.apply(meta=(None, 'object')) converts to pyarrow's string

This behavior can be controlled/disabled with the `dask.dataframe.convert-string` option, e.g. ```python import dask dask.config.set({"dask.dataframe.convert-string": False}) ``` disables this

Parquet Reboot

+1 I think specializing on pyarrow will be the biggest benefit here that allows to cut complexity. Additionally, I would like us to review closely how we want to deal...

Parquet Reboot

> " and if the current system isn't really solid/maintained/maintainable, then maybe the advice/best practice of using Parquet Thanks for raising this concern. I believe the API as is is...

Parquet Reboot

In terms of prioritization I consider getting dask-expr into main dask the most important thing right now. There is still a lot of uncertainty so nobody is working on parquet...

Allow partial graph computation without full materialization

@rjzamora can you please review this?

Run tests with ``crick=0.0.4`` release

I'd like to point out that the test we're dealing with here is not very sensible. It uses test data with 9 data points so the result of the quantile...

Run tests with ``crick=0.0.4`` release

I propose to ditch `dask.dataframe.tests.test_dataframe.py::test_quantile` and replace it with a test that iterates over a couple of different random distributions and compares it's results to numpy. If there are no...