Florian Jetter

Results 380 comments of Florian Jetter

I _think_ the test_merge failures are actually unrelated. The exception is ```python-traceback def validate_data(self, data: pd.DataFrame) -> None: > if set(data.columns) != set(self.meta.columns): E AttributeError: 'tuple' object has no attribute...

yeah, so the exception is pretty much what I expected ```python-traceback if not isinstance(data, pd.DataFrame): > raise TypeError(f"Expected {data=} to be a DataFrame, got {type(data)}.") E TypeError: Expected data=('assign-3d7cfa7cea412465799bea6cfac1b512', 1)...

Ah, this is the build with dask-expr *disabled*. Now I can reproduce!

`distributed/shuffle/tests/test_graph.py::test_multiple_linear` failure is also related. also a legacy-only problem

https://github.com/dask/dask/pull/11445/ is hopefully the last one

Same goes for `distributed.cli.tests.test_dask_spec.test_errors`. Since the asyncio timeout hits we don't even get the subprocess output

This still has a problem...

@hendrikmakait should be good to go now

The deadlock of the test happens with the following logs ``` 2024-08-30 18:20:53,413 - distributed.shuffle._scheduler_plugin - WARNING - Shuffle 4b151ea2e7118ca361cbf67e2e3cbf08 initialized by task ('shuffle-transfer-4b151ea2e7118ca361cbf67e2e3cbf08', 2) executed on worker tcp://127.0.0.1:64362 2024-08-30...

I think the shuffle actually finishes just way too quickly... ``` Transition(key=('p2pshuffle-3c0e70f475016b3312ed2e08b06af4fd', 0), start='processing', finish='memory', recommendations={('getitem-f4ac494203a9f98493521efbbad2b8ea', 0): 'processing', 'shuffle-barrier-3c0e70f475016b3312ed2e08b06af4fd': 'released'}, stimulus_id='task-finished-1725035052.8782039', timestamp=1725035052.881619), Transition(key='shuffle-barrier-3c0e70f475016b3312ed2e08b06af4fd', start='memory', finish='released', recommendations={}, stimulus_id='task-finished-1725035052.8782039', timestamp=1725035052.88164)]) ```