Multi-Node Communication (PHOLD) Benchmarks
This is an issue for tracking some of the tasks needed to support various multi-machine benchmark scenarios.
As a reference, the current single-machine PHOLD benchmark code lives in these two files: https://github.com/GMLC-TDC/HELICS/blob/master/benchmarks/helics/PholdFederate.hpp https://github.com/GMLC-TDC/HELICS/blob/master/benchmarks/helics/pholdBenchmarks.cpp
For the PHOLD benchmark, some of the tests we want to run on multiple nodes to create a graph with several curves of the EvRate (Event Rate) versus PEs (processing elements) are:
- 1 node with PEs increasing from 1 to 72
- 2 nodes with number of PEs per node increasing from 1 to 36
- 4 nodes with number of PEs per node increasing from 1 to 18
- 8 nodes with the number of PEs per node increasing from 1 to 9
- Extend one or two of the curves out to more PEs, possibly the 2 or 4 node test out to 50-64 PEs on a node
- A large scale run with 256 nodes, with runs using 2 and 18 PEs per node
For the baseline test, the brokers will be arranged as a balanced binary tree with a broker per node serving the PEs on that node. At least one of the test curves should be re-run with different core types to compare how the zmq, tcp, udp, and mpi comm types perform.
Other ideas:
- Have the federates (PEs) on a group of nodes share a broker instead of a single broker per node
- When brokers that can bridge multiple comm types are added, try the different core type test with ipc for local comms
- Change the communication pattern to favor sending scheduling events at nodes in closer proximity; e.g. Watts-Strogatz network graph
- Enable federates to send messages directly to each other, bypassing the broker
Things needed for the basic set of tests:
- [x] Modify the single machine Google benchmark PHOLD federate to be able to run as a standalone federate with the option to output evrate/time stats.
- [ ] Combine run stats output from multiple PHOLD federates
- [x] Create a launching script for setting up multi-node PHOLD runs
For the other ideas, the additional features needed are:
- [x] A broker that can bridge multiple communication types
- [ ] Direct comm routes between federates for sending messages
- [x] A federate that can be used to setup communication patterns between federates using a Watts-Strogatz graph