sst-elements icon indicating copy to clipboard operation
sst-elements copied to clipboard

Help regarding dumping event trace during Merlin Network simulation

Open saichenna opened this issue 3 years ago • 5 comments

New Issue for sst-elements

1 - Detailed description of problem or enhancement I've recently started using SST for network simulations. I'm currently using the examples under the tests within ember to simulate various network topologies for the HPC communication patterns provided in Ember. I would like to know if there is anyway to dump the event trace, to know more information during the communication phase (for e.g. walltime/time-stamp when the message has been initiated at the sender, walltime/time-stamp while the final packet has been delivered to the final destination). I have reinstalled SST-core with --enable-debug and --enable-event-tracking flags. Can you help me on how to dump more information (like the timestamp of communication events etc..) which would be helpful in generating event timeline?
2 - Describe how to reproduce N/A. I have used dragon_128_allreduce.py under the tests folder within Ember 3 - What Operating system(s) and versions GNU/Linux Ubuntu 20.04.3 LTS 4 - What version of external libraries (Boost, MPI) OpenMPI 4.0.6 5 - Provide sha1 of all relevant sst repositories (sst-core, sst-elements, etc) SST 11.0.0 6 - Fill out Labels, Milestones, and Assignee fields as best possible

saichenna avatar Nov 29 '21 20:11 saichenna

I suspect the --enable-debug and --enable-event-tracking flags are not what you want. There is a way to trace the packets through the NICs and routers, but I don't think ember fills out the fields to do that. However, it looks like you are mostly interested in the host side timestamps anyway. I'll need to dig into the ember code a bit more to see what options are available there.

feldergast avatar Nov 29 '21 21:11 feldergast

Thank you @feldergast for the quick response! Just to provide more details, I'm trying to build a notional system consisting of a hierarchical topology (which is made up from the existing topologies available in SST, such as dragonfly, fat tree etc..), which involves adding a new topology class into merlin and also implementing multiple routing algorithms. In order to validate our implementation, we would like to have something like you mentioned above (trace the packets through NICs and routers for debugging) but at the same time we would also like to have the host-side timestamps too (which helps us in validating the communication pattern too)

saichenna avatar Nov 29 '21 22:11 saichenna

If you want to enable packet tracing through the merlin components, you can turn it on using parameters to the EmberMPIJob. The variables you need to set are (assuming ember_job is the name of the variable holding the EmberMPIJob instance):

ember_job.nic.tracedNode = node_to_trace ember_job.nic.tracedPkt = packet_number_to_trace # set to -2 to trace all packets

You can only trace packets from one node at a time, but that's probably all you want to do or you'd have a hard time sorting them all. There's a counter that counts packet, so you can turn tracing on for a single packet, or you can set the value to -2 to trace all packets from that node. I believe the node number refers to the logical number of the node in the job, not to the actual allocated node number. The traces will give you the physical source and destination, so you won't have to figure out what it is based on the allocation.

This is all based on a quick look through the code, I did not test it myself, so if it doesn't work, please let me know.

feldergast avatar Nov 30 '21 00:11 feldergast

Hello @feldergast! Thank you for the fix, I was able to track the packets from one node. Can you also point me if there is anyway to dump the time-stamp when the message has been initiated at the sender and walltime/time-stamp while the final packet has been delivered to the final destination across all the endpoint nodes?

saichenna avatar Nov 30 '21 18:11 saichenna

I didn't see any obvious way to print the timestamps for send and receive in the ember stack. I'll have to did a little deeper, but won't have time in the next couple of days to do it. I'll let you know what I find.

feldergast avatar Nov 30 '21 20:11 feldergast