sst-elements icon indicating copy to clipboard operation
sst-elements copied to clipboard

Cannot run Opal example

Open nhatdangncsu opened this issue 2 years ago • 6 comments

New Issue for sst-elements

1 - When I run opal example in SST element directory sst-elements/src/sst/elements/Opal/tests/basic_1node_1smp.py, it 's giving me the following error:

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

2 - Command to reproduce: sst -v sst-elements/src/sst/elements/Opal/tests/basic_1node_1smp.py

3 - Operating system: Ubuntu 22.04

4 - MPI version 4.0.5 Can you help me check this issue? Thank you.

nhatdangncsu avatar Jan 17 '23 16:01 nhatdangncsu

Is merlin.Bridge removed? This is the beginning of my output.

WARNING: Component name 'Bridge:node0_internal_network-node0_Bridge' is not valid                                                                                                                         
 WARNING: Link name 'B0-node0_internal_network-node0_Bridge' is not valid                                                                                                                                  
 WARNING: Link name 'B0-node0_internal_network-node0_Bridge' is not valid                                                                                                                                  
 WARNING: Link name 'B1-node0_internal_network-node0_Bridge' is not valid                                                                                                                                  
 WARNING: Link name 'B1-node0_internal_network-node0_Bridge' is not valid                                                                                                                                  
 WARNING: Component name 'Bridge:Ext_Mem_Net-node0_Bridge' is not valid                                                                                                                                    
 WARNING: Link name 'B0-Ext_Mem_Net-node0_Bridge' is not valid                                                                                                                                             
 WARNING: Link name 'B0-Ext_Mem_Net-node0_Bridge' is not valid                                                                                                                                             
 WARNING: Link name 'B1-Ext_Mem_Net-node0_Bridge' is not valid                                                                                                                                             
 WARNING: Link name 'B1-Ext_Mem_Net-node0_Bridge' is not valid                                                                                                                                             
 WARNING: Component name 'Bridge:node1_internal_network-node1_Bridge' is not valid                                                                                                                         
 WARNING: Link name 'B0-node1_internal_network-node1_Bridge' is not valid                                                                                                                                  
 WARNING: Link name 'B0-node1_internal_network-node1_Bridge' is not valid                                                                                                                                  
 WARNING: Number of invalid link names exceeds limit of 10, no more messages will be printed                                                                                                               
 WARNING: Component name 'Bridge:Ext_Mem_Net-node1_Bridge' is not valid

nhatdangncsu avatar Jan 18 '23 20:01 nhatdangncsu

merlin.Bridge is still available. These are warning that the names being used are invalid. We instituted a naming convention, which essentially requires names to be valid python names (with a few tweaks) and that input file must not have been updated with new names that were compliant. None of those warnings would cause functional issues in the simulation.

feldergast avatar Jan 18 '23 20:01 feldergast

Ok. Did you run the above example file successfully? I still don't understand why it keeps giving me the above error.

nhatdangncsu avatar Jan 18 '23 21:01 nhatdangncsu

I'm not able to run that file on my development machine because I don't have PIN installed. This is not my area of expertise, but if I had to guess, I would guess that there is an issue with you PIN installation or version. I know that there have been some issues with PIN support recently, but I don't know the details.

feldergast avatar Jan 18 '23 22:01 feldergast

I don't think it 's a PIN problem because I can run stream example in ariel directory.

  496 LLSC: N Lock: (0,0)
   497 LLSC: N Lock: (0,0)
   498 LLSC: N Lock: (0,0)
   499 LLSC: N Lock: (0,0)
   500 LLSC: N Lock: (0,0)
   501 LLSC: N Lock: (0,0)
   502 LLSC: N Lock: (0,0)
   503 LLSC: N Lock: (0,0)
   504 LLSC: N Lock: (0,0)
   505 LLSC: N Lock: (0,0)
   506 LLSC: N Lock: (0,0)
   507 LLSC: N Lock: (0,0)
   508 LLSC: N Lock: (0,0)
   509 LLSC: N Lock: (0,0)
   510 LLSC: N Lock: (0,0)
   511 LLSC: N Lock: (0,0)
End MemHierarchy::Cache

  Checking for unreceived events on up link: 
  Checking for unreceived events on down link: 
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

nhatdangncsu avatar Jan 18 '23 23:01 nhatdangncsu

I have been able to identify the line that cause the problem Line 240: PTWMemLink.connect((mmu, "ptw_to_mem%d"%(next_core-cores//2), "300ps"), (l1_cpulink, "port", "300ps")) I want to ask if page table walker in Samba needs to use ptw_to_mem port or not.

nhatdangncsu avatar Jan 23 '23 13:01 nhatdangncsu