sst-elements
sst-elements copied to clipboard
Cannot run Opal example
New Issue for sst-elements
1 - When I run opal example in SST element directory sst-elements/src/sst/elements/Opal/tests/basic_1node_1smp.py, it 's giving me the following error:
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.
2 - Command to reproduce: sst -v sst-elements/src/sst/elements/Opal/tests/basic_1node_1smp.py
3 - Operating system: Ubuntu 22.04
4 - MPI version 4.0.5 Can you help me check this issue? Thank you.
Is merlin.Bridge removed? This is the beginning of my output.
WARNING: Component name 'Bridge:node0_internal_network-node0_Bridge' is not valid
WARNING: Link name 'B0-node0_internal_network-node0_Bridge' is not valid
WARNING: Link name 'B0-node0_internal_network-node0_Bridge' is not valid
WARNING: Link name 'B1-node0_internal_network-node0_Bridge' is not valid
WARNING: Link name 'B1-node0_internal_network-node0_Bridge' is not valid
WARNING: Component name 'Bridge:Ext_Mem_Net-node0_Bridge' is not valid
WARNING: Link name 'B0-Ext_Mem_Net-node0_Bridge' is not valid
WARNING: Link name 'B0-Ext_Mem_Net-node0_Bridge' is not valid
WARNING: Link name 'B1-Ext_Mem_Net-node0_Bridge' is not valid
WARNING: Link name 'B1-Ext_Mem_Net-node0_Bridge' is not valid
WARNING: Component name 'Bridge:node1_internal_network-node1_Bridge' is not valid
WARNING: Link name 'B0-node1_internal_network-node1_Bridge' is not valid
WARNING: Link name 'B0-node1_internal_network-node1_Bridge' is not valid
WARNING: Number of invalid link names exceeds limit of 10, no more messages will be printed
WARNING: Component name 'Bridge:Ext_Mem_Net-node1_Bridge' is not valid
merlin.Bridge is still available. These are warning that the names being used are invalid. We instituted a naming convention, which essentially requires names to be valid python names (with a few tweaks) and that input file must not have been updated with new names that were compliant. None of those warnings would cause functional issues in the simulation.
Ok. Did you run the above example file successfully? I still don't understand why it keeps giving me the above error.
I'm not able to run that file on my development machine because I don't have PIN installed. This is not my area of expertise, but if I had to guess, I would guess that there is an issue with you PIN installation or version. I know that there have been some issues with PIN support recently, but I don't know the details.
I don't think it 's a PIN problem because I can run stream example in ariel directory.
496 LLSC: N Lock: (0,0)
497 LLSC: N Lock: (0,0)
498 LLSC: N Lock: (0,0)
499 LLSC: N Lock: (0,0)
500 LLSC: N Lock: (0,0)
501 LLSC: N Lock: (0,0)
502 LLSC: N Lock: (0,0)
503 LLSC: N Lock: (0,0)
504 LLSC: N Lock: (0,0)
505 LLSC: N Lock: (0,0)
506 LLSC: N Lock: (0,0)
507 LLSC: N Lock: (0,0)
508 LLSC: N Lock: (0,0)
509 LLSC: N Lock: (0,0)
510 LLSC: N Lock: (0,0)
511 LLSC: N Lock: (0,0)
End MemHierarchy::Cache
Checking for unreceived events on up link:
Checking for unreceived events on down link:
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
I have been able to identify the line that cause the problem
Line 240: PTWMemLink.connect((mmu, "ptw_to_mem%d"%(next_core-cores//2), "300ps"), (l1_cpulink, "port", "300ps"))
I want to ask if page table walker in Samba needs to use ptw_to_mem port or not.