HELICS icon indicating copy to clipboard operation
HELICS copied to clipboard

Multi-Node Communication (PHOLD) Benchmarks

Open nightlark opened this issue 6 years ago • 0 comments

This is an issue for tracking some of the tasks needed to support various multi-machine benchmark scenarios.

As a reference, the current single-machine PHOLD benchmark code lives in these two files: https://github.com/GMLC-TDC/HELICS/blob/master/benchmarks/helics/PholdFederate.hpp https://github.com/GMLC-TDC/HELICS/blob/master/benchmarks/helics/pholdBenchmarks.cpp

For the PHOLD benchmark, some of the tests we want to run on multiple nodes to create a graph with several curves of the EvRate (Event Rate) versus PEs (processing elements) are:

  • 1 node with PEs increasing from 1 to 72
  • 2 nodes with number of PEs per node increasing from 1 to 36
  • 4 nodes with number of PEs per node increasing from 1 to 18
  • 8 nodes with the number of PEs per node increasing from 1 to 9
  • Extend one or two of the curves out to more PEs, possibly the 2 or 4 node test out to 50-64 PEs on a node
  • A large scale run with 256 nodes, with runs using 2 and 18 PEs per node

For the baseline test, the brokers will be arranged as a balanced binary tree with a broker per node serving the PEs on that node. At least one of the test curves should be re-run with different core types to compare how the zmq, tcp, udp, and mpi comm types perform.

Other ideas:

  • Have the federates (PEs) on a group of nodes share a broker instead of a single broker per node
  • When brokers that can bridge multiple comm types are added, try the different core type test with ipc for local comms
  • Change the communication pattern to favor sending scheduling events at nodes in closer proximity; e.g. Watts-Strogatz network graph
  • Enable federates to send messages directly to each other, bypassing the broker

Things needed for the basic set of tests:

  • [x] Modify the single machine Google benchmark PHOLD federate to be able to run as a standalone federate with the option to output evrate/time stats.
  • [ ] Combine run stats output from multiple PHOLD federates
  • [x] Create a launching script for setting up multi-node PHOLD runs

For the other ideas, the additional features needed are:

  • [x] A broker that can bridge multiple communication types
  • [ ] Direct comm routes between federates for sending messages
  • [x] A federate that can be used to setup communication patterns between federates using a Watts-Strogatz graph

nightlark avatar Dec 11 '19 00:12 nightlark