sst-elements icon indicating copy to clipboard operation
sst-elements copied to clipboard

Questions on extracting input SST config from real HW

Open kvoronin-intel opened this issue 1 year ago • 2 comments

Hello all!

My research project is focused on network performance analysis for a code resembling 3D FFT motif. I'd like to evaluate the difference between SST simulation and real HW run (say, conventional CPU cluster) for a case when performance is completely dominated by the communication (e.g., FFT of size ~ 150^3 on ~ 1000 MPIs and ~ 100 nodes).

Could you recommend a way to learn network topology/characteristics from real cluster so that I can put them into SST config?

While I know most details of the network topology, I don't have numbers for router characteristics (buffer sizes, latency, xbar_bw, link_bw etc.) as well as networkif buffer sizes and link speed.

I hope it's something well known and I can work with cluster admins to get necessary software installed if needed, I just don't know what exactly should be used to get the required information.

Any comments/advice would be very much appreciated.

Thanks, Kirill

kvoronin-intel avatar Jun 05 '23 19:06 kvoronin-intel

@feldergast may have some suggestions, but I think that you would need to work with your system admin and vendors to obtain that information.

hughes-c avatar Jun 21 '23 15:06 hughes-c

My focus has mostly been on forward looking research, so I've been looking at possible future architectures and have just used values that were reasonable given the timeframe we were looking at. If you're trying to compare to an existing machine, then trying to get the documentation for the hardware and asking the system admins for help is the right approach. Link speeds are pretty easy to get, the rest of the router parameters can be a bit tricky. Here are some rules of thumb I use for forward looking research:

  • xbar_bw: use 1.5-2x link bandwidth
  • flit_size: choose such that xbar_bw / flit_size gives a reasonable clock frequency
  • input_buffer_size: use 2-3x the round trip (the two in the following equation) bandwidth delay product (2 * link_bw * (link_lat + input_lat + output_lat))
  • output_buffer_size: I often just use the same as input_buffer_size, but it can be smaller as it only needs to cover the bandwidth delay product of the crossbar.
  • input_latency and output_latency: I just use half of the unloaded router latency on each of these. Usually the product literature will give you this latency

feldergast avatar Aug 03 '23 20:08 feldergast