Ken Raffenetti

Results 134 comments of Ken Raffenetti

test:mpich/ch4/ucx test:mpich/ch4/ucx/gpu

@aruhela just to confirm, is the message sent between processes on the same node? In a standard configuration, MPICH can only generate events for unexpected messages arriving via the shared...

What is your execution environment? Single node or multinode? If multinode, what kind of interconnect? Do you know what the MPICH configuration is for the library you are using?

I suspect this is the recursive mutex check. The mutex owner is set/reset under lock, but it is read using a regular load, without holding the lock, in `MPIDUI_THREAD_CS_ENTER`. https://github.com/pmodels/mpich/blob/959b11dc24f9853b88725a43dcbc9ce8c716622d/src/mpid/common/thread/mpidu_thread_fallback.h#L139-L161

Well, this from the log probably invalidates my theory 🤷 > [0] Location is global 'MPIDI_global' at 0x0001113e1840 (libpmpi.0.dylib+0x529ac00)

Wait, that is where the mutexes (and owners) are stored. So it could still be the case 😅.

@thomasgillis since we removed the recursive locking, could you try and see if this warning is fixed?

test:mpich/custom netmod:ch4:ofi testlist:part

Fixed in 8e58a6874438e86862350004117e3047b48962c5.