dash
dash copied to clipboard
halo.update_async(); Deadlock in Multi Node Case on feat-halo
In my stencil benchmark I use the halo wrapper as in ex.02.matrix.halo.heat_equation
. When building with the feat-halo
branch my code deadlocks at halo.update_async();
when running the code on more than one node.
If I use the developement
branch or run on a single node with more than one dash unit, this doesn't happen.
This also seems to be the reason for dash-test-mpi
not being able to finish in 8 hours on multiple nodes in issue #682. If one looks at the end of the test output posted in that issue, one can observe that it was at mHaloTest.HaloMatrixWrapperNonCyclic2D
when the test was cancelled due to a time limit.
I will look into next week.
@Spielix can you please name the flags you used. Did you enabled DYNAMIC_WINDOWS or SHARED_WINDOWS. I can't reproduce the error. The only thing happened to me, was a MPI_Win_detach error. The error is located in OpenMPI. We have a workaround provided by @devreal