ElenaKhaustova comments

Results 23 comments of


                                            ElenaKhaustova

Improve docstring for `add_feed_dict`

Solved in https://github.com/kedro-org/kedro/pull/4009

Investigate why Spaceflights project failing with `ParallelRunner`

Tested with: - `scikit-learn : 1.4.1.post1` - `numpy==1.26.4` Things explored so far: 1. The error is happening when `scikit-learn` validates input data https://github.com/scikit-learn/scikit-learn/blob/941acc419b8e7bec86fdc6b27ab3c4703022f140/sklearn/utils/validation.py#L1099 2. The validation includes converting input data...

Investigate why Spaceflights project failing with `ParallelRunner`

After further investigation, it was found out that the problem appears after the object is retrieved from `SharedMemoryDataset`. In the example below we convert `pandas.core.series.Series` to `numpy array` and then...

Investigate why Spaceflights project failing with `ParallelRunner`

A further plan is to investigate what's happening in the `SharedMemoryDataset`, whether it's expected, and why it only affects `pandas.core.series.Series`.

Investigate why Spaceflights project failing with `ParallelRunner`

In the earlier` scikit-learn` versions

Investigate why Spaceflights project failing with `ParallelRunner`

With the test below it was confirmed that the problem is in `SharedMemoryDataset` as exactly the same example as above but with `MemoryDataset` works well. ``` input_path = Path.cwd() /...

Investigate why Spaceflights project failing with `ParallelRunner`

Further tests excluded the kedro code base. The actual problem happens when using `multiprocessing.managers.BaseManager` inside the `ParallelRunner`. We registering `MemoryDataset` to be used with `multiprocessing.managers.BaseManager` as follows: ``` class ParallelRunnerManager(SyncManager):...

Investigate why Spaceflights project failing with `ParallelRunner`

The reason for the above is that `numpy` doesn't allow arrays based on read-only buffer to be set as writeable. Possible reason of why the behaviour differs for `pd.DataFrame` and...

Investigate why Spaceflights project failing with `ParallelRunner`

So the solution that might work for us is to modify the part where we retrieve data from the catalog before calling the node function [here](https://github.com/kedro-org/kedro/blob/cb1e84496ef154b607aa07b529caa06006df5ca8/kedro/runner/runner.py#L484): ``` def _run_node_sequential( node:...

Investigate why Spaceflights project failing with `ParallelRunner`

**Summary:** - the problem relates to shared memory usage - the problem is not on our side; at least it's not a bug made by us - if not addressing...