CONQUEST-release icon indicating copy to clipboard operation
CONQUEST-release copied to clipboard

Possible bug when running on one MPI process

Open tkoskela opened this issue 1 year ago • 6 comments

There's possibly a bug in the MPI communication which appears when running on one process. Collecting hints in this issue

In test_004 of f-exx-opt we notice a difference in the order of 1e-5 in the Harris-Foulkes energy when running on one MPI process, compared to running on multiple processes. In conversation with @lionelalexandre it came up he has been aware of this for some time. Other tests in the testsuite have a tolerance of 1e-4, so they might be missing this.

When running the code in the DDT debugger on myriad with one MPI process, we get a segfault in https://github.com/OrderN/CONQUEST-release/blob/6bf8f4a8c20fd4fa8f1c7baeb8a6b1f23a6d2408/src/generic_comms.f90#L1780-L1782 I haven't yet found an obvious reason why. MPI_alltoallv is complicated. Obviously on 1 process it should be doing nothing.

tkoskela avatar Feb 07 '24 10:02 tkoskela