Dariomnz
Dariomnz
Can I get more feedback, such as how do I debug the app to find out why it's stuck?
As I said before the program gets stuck in the third Send-Recv, you can see it in the gdb backtrace as you told me to debug. gdb server: ```gdb gdb...
After much trial and error the problem was that I needed to compile openmpi with slurm (--with-slurm=/opt/slurm), otherwise this behavior of getting stuck on the third send would happen.
> Could you set this env variable in the shell where the parent process is started? > > export PMIX_MCA_gds=hash > > and rerun and see it the problem persists?...
Maybe it is the same problem as in this issue: #12599
I launch the test in docker simulating the nodes with containers. I installed the version you told me and I get the same error trace as above. As I have...