unidist
unidist copied to clipboard
Open MPI hangs on Modin tests
When running the following Modin tests on Open MPI the flow hangs both in CI and locally.
MODIN_ENGINE=unidist mpiexec --oversubscribe -x UNIDIST_MPI_SHARED_OBJECT_STORE=True -n 1 python -m pytest modin/pandas/test/internals/test_benchmark_mode.py
However, it works on Intel MPI.
The issue is reproducible even before introducing the shared memory feature.
I see another test that hangs with unidist: https://github.com/modin-project/modin/actions/runs/6756756901/job/18366579579?pr=6707 or https://github.com/modin-project/modin/actions/runs/6737323284/job/18314544412?pr=6697
mpiexec aborting job...
modin/pandas/test/test_io.py::TestCsv::test_dataframe_to_csv
job aborted:
[ranks] message
[0] job terminated by the user
[1-4] terminated
---- error analysis -----
[0] on fv-az836-953
ctrl-c was hit. job aborted by the user.
---- error analysis -----