unidist icon indicating copy to clipboard operation
unidist copied to clipboard

Open MPI hangs on Modin tests

Open YarShev opened this issue 2 years ago • 2 comments

When running the following Modin tests on Open MPI the flow hangs both in CI and locally.

MODIN_ENGINE=unidist mpiexec --oversubscribe -x UNIDIST_MPI_SHARED_OBJECT_STORE=True -n 1 python -m pytest modin/pandas/test/internals/test_benchmark_mode.py

However, it works on Intel MPI.

YarShev avatar Oct 02 '23 08:10 YarShev

The issue is reproducible even before introducing the shared memory feature.

YarShev avatar Oct 05 '23 22:10 YarShev

I see another test that hangs with unidist: https://github.com/modin-project/modin/actions/runs/6756756901/job/18366579579?pr=6707 or https://github.com/modin-project/modin/actions/runs/6737323284/job/18314544412?pr=6697

mpiexec aborting job...
modin/pandas/test/test_io.py::TestCsv::test_dataframe_to_csv 
job aborted:
[ranks] message

[0] job terminated by the user

[1-4] terminated

---- error analysis -----

[0] on fv-az836-953
ctrl-c was hit. job aborted by the user.

---- error analysis -----

anmyachev avatar Nov 05 '23 12:11 anmyachev