unidist issues

`ValueError: Unknown DataID!` or `KeyError: <weakref at 0x000002432FB250E0; to 'MasterDataID' at 0x000002432F873F40>`

I was unable to reproduce exactly the same problem without Modin. KeyError reproduced when called `query_compiler.to_pandas` from worker process (during pickling) with changes from https://github.com/modin-project/modin/pull/6673 ```python import modin.pandas as pd...

anmyachev

bug 🦗

MPI

`total.03-fe.04-attach_features.11-user-item volume features` stage of HM is 3x slower on MPI than on Ray

We should figure out why MPI is slow here and fix it. https://github.com/intel-ai/timedf_benchmarks/blob/b092ea0d490eb630224fc4ffdbc2f62630f57e49/timedf_benchmarks/hm_fashion_recs/fe.py#L159

YarShev

performance 🚀

MPI

Enabling Modin numpy API in HM slows down MPI backend

We should figure out why the slow down occurs and fix it. https://github.com/intel-ai/timedf_benchmarks/blob/b092ea0d490eb630224fc4ffdbc2f62630f57e49/timedf_benchmarks/hm_fashion_recs/fe.py#L228

YarShev

performance 🚀

MPI

Enable the MPI shared object store on msmpi

MSMPI implements a subset of features from MPI 3.1 standard, one of which is MPI shared memory - https://github.com/Microsoft/Microsoft-MPI#version-of-mpi-standard. We should enable the MPI shared object store for msmpi.

YarShev

new feature/request 💬

MPI

Aligned shared memory access

When putting data into the MPI shared object store we should try to use aligned memory access. This should improve performance.

YarShev

performance 🚀

MPI

Actor tests for Ray hang in CI

1

Actor tests for Ray hang in CI for some reason but pass locally.

YarShev

CI

Ray

[MPI] Unidist hungs when finishing work

Unidist returns a result, but one of the workers cannot finish the job. ```python import unidist import time @unidist.remote def g(number): time.sleep(1) return number**2 @unidist.remote def f(): results = []...

Retribution98

bug 🦗

MPI

Add parameters support in `unidist.init`

Parameters support should be added to `unidist.init` method to give more flexibility in setup framework. Parameters are the next: - num_cpus - backend - address - ...

prutskov

Generic

Use a background thread to push and recieve data to/from workers

Spawn a background thread which send data to workers, So the main thread is not blocked during the data send and serialisation processes.

arunjose696

performance 🚀

MPI

Track the number of tasks executing on each worker, and submit tasks to workers with minimum number of tasks

Alternative to the current round robin approach, A possible way of scheduling is to track the number of tasks running on each worker in a scheduler class. This could be...

arunjose696

performance 🚀

MPI

unidist
unidist copied to clipboard

Metadata

`ValueError: Unknown DataID!` or `KeyError: <weakref at 0x000002432FB250E0; to 'MasterDataID' at 0x000002432F873F40>`

`total.03-fe.04-attach_features.11-user-item volume features` stage of HM is 3x slower on MPI than on Ray

Enabling Modin numpy API in HM slows down MPI backend

Enable the MPI shared object store on msmpi

Aligned shared memory access

Actor tests for Ray hang in CI

[MPI] Unidist hungs when finishing work

Add parameters support in `unidist.init`

Use a background thread to push and recieve data to/from workers

Track the number of tasks executing on each worker, and submit tasks to workers with minimum number of tasks

← Metadata

Owner

Metadata

unidist unidist copied to clipboard

Metadata

← Metadata

Owner

Metadata

unidist
unidist copied to clipboard