SMSC/knem throws error messages referencing BTL/knem
I tried to build OpenMPI v5 with knem, but it fails at runtime throwing errors about btl_sm_single_copy_mechanism.
While in reality the error is in smsc component and setting OMPI_MCA_coll_smsc=^knem disables the warning.
I'm fairly sure it's just a documentation issue in help-smsc-knem.txt.
--------------------------------------------------------------------------
WARNING: Open MPI failed to open the /dev/knem device due to a local
error. Please check with your system administrator to get the problem
fixed, or set the btl_sm_single_copy_mechanism MCA variable to none
to silence this warning and run without knem support.
The sm shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: beluga4
Errno: 2 (No such file or directory)
--------------------------------------------------------------------------
@BKitor I think you're reporting two things:
- You ran into an error trying to use Open MPI's knem support
- The error message Open MPI emitted was incorrect
Is that right?
- Does
/dev/knemexist? If so, can you look in the system logs to see what error occurred when MPI processes tried to open it? - I've filed #9805 to fix the message. Thanks for reporting it!
I'm not really worried about getting knem working at the moment, but I figured you guys would want to know about weird error message.
Thanks for reporting! Let's leave this open until #9805 and a corresponding PR for the upcoming v5.0.x branch merges. It helps us track / make sure we don't forget stuff.