easybuild-easyconfigs icon indicating copy to clipboard operation
easybuild-easyconfigs copied to clipboard

intel-2023b: fi_info not working as expected

Open sassy-crick opened this issue 4 months ago • 1 comments

During running the test jobs of MOLCAS-84 I came across that issue:

Abort(606203407) on node 10 (rank 10 in comm 0): Fatal error in PMPI_Put: Other MPI error, error stack:
PMPI_Put(160)........: MPI_Put(origin_addr=0x7ffde9f037a0, origin_count=5, MPI_LONG, target_rank=6, target_disp=50, target_count=5, MPI_LONG, win=0xe0000002) failed
MPID_Put(896)........: 
MPIDI_put_safe(565)..: 
MPIDI_put_unsafe(71).: 
MPIDI_OFI_do_put(436): OFI rdma write immediate failed (ofi_rma.h:436:MPIDI_OFI_do_put:Invalid argument)

Given that more than one job failed, I done a bit of digging and notice this command is not working as expected:

$ fi_info | grep provider
fi_getinfo: -61

Further digging revealed it is working up to intel-2023a and also works with intel-2024a. So for me clearly intel-2023b has a problem. My hunch is the problem is outside of what EasyBuild does. I will try and do some more digging.

sassy-crick avatar Oct 22 '24 12:10 sassy-crick