dash icon indicating copy to clipboard operation
dash copied to clipboard

halo.update_async(); Deadlock in Multi Node Case on feat-halo

Open pauleonix opened this issue 5 years ago • 2 comments

In my stencil benchmark I use the halo wrapper as in ex.02.matrix.halo.heat_equation. When building with the feat-halo branch my code deadlocks at halo.update_async(); when running the code on more than one node. If I use the developement branch or run on a single node with more than one dash unit, this doesn't happen.

This also seems to be the reason for dash-test-mpi not being able to finish in 8 hours on multiple nodes in issue #682. If one looks at the end of the test output posted in that issue, one can observe that it was at mHaloTest.HaloMatrixWrapperNonCyclic2D when the test was cancelled due to a time limit.

pauleonix avatar Feb 01 '20 20:02 pauleonix

I will look into next week.

dhinf avatar Feb 07 '20 11:02 dhinf

@Spielix can you please name the flags you used. Did you enabled DYNAMIC_WINDOWS or SHARED_WINDOWS. I can't reproduce the error. The only thing happened to me, was a MPI_Win_detach error. The error is located in OpenMPI. We have a workaround provided by @devreal

dhinf avatar Feb 18 '20 13:02 dhinf