Anytopo
Hi Devs,
This is a pull request, previously mentioned in issue #2584
There are documentations of this work in
"sst-elements-src/src/sst/elements/merlin/interfaces/endpointNIC/documentations/README.md", and
"sst-elements-src/src/sst/elements/merlin/topology/anytopo_ultility/README.md"
In a few words, this branch extends the Merlin module in SST-element, such that any network topology can be built with an input networkx graph. Another key feature is to support source routing (the endpoint NIC determines the routed paths). And a few popular HPC network topologies (dragonfly, slimfly, polarfly, jellyfish) are already defined in
"sst-elements-src/src/sst/elements/merlin/topology/anytopo_ultility/" dir, together with some tests in "sst-elements-src/src/sst/elements/merlin/topology/anytopo_ultility/tests".
If you would like to accept this pull requests, maybe we can move these tests to the main merlin test dir?
It worth mention that the reorderedlinkcontrol has been modified to fit in the new 'ExtendedRequest' framework, this has been included in tests as well, see EndpointNIC(use_reorderLinkControl=True ... in the tests.
I believe that this pull request will have the following contribution to Merlin:
- supporting more network topologies (any topology from input graph)
- supporting source routing, with is increasingly interesting in HPC network traffic engineering
- Adds a framework of EndpointNIC that allows different NIC/smartNIC functionalities to be plugged in the packet-processing pipeline. For now source routing is implemented through this framework.
Please let me know if there is any feedback/comments/questions for the code.
Best regards, Z.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.
In order to merge the PR, the tests will need to be part of the main merlin testing. Go ahead and move them there and we can review the code and release the PR for testing.
Before I add new tests, I want to make sure the current tests all pass. I don't really understand why the 'macos-14 / macos-15' are failing here?
When I run 'sst-test-elements' locally (ubuntu 22, python3.8), all tests pass (see screenshot below).
For example the test "test_EmberSweep_*" all passed locally, but failed in the github action... Does anyone have an idea what is wrong there?
There are some known issues with the GitHub macOS runners and how they interact with SST. You can safely ignore the results of the GitHub macOS builders for now. They are experimental and the official testing is done on other machines.
Looks like that those issues were actually fixed a week or so ago. I dug a little deeper and it looks like you introduced a dependance on a third-party library that isn't installed on those systems (NetworkX). You'll need to do a check for that and not load it if it's not there (i.e. anytopo won't be available if NetworkX isn't there. The polarfly and polarstar topos have a similar issue with a different library. Give me a bit to look through the code and I can send some pointers on the best way to do this.
Looks like polarstar and polarfly depend on the same third-party Python libraries as anytopo, so we already have builders set up with those libraries, so anytopo should test on those and be skipped on all the ones without those libraries.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.
Looks like polarstar and polarfly depend on the same third-party Python libraries as anytopo, so we already have builders set up with those libraries, so anytopo should test on those and be skipped on all the ones without those libraries.
Ok that's good news. Thanks for checking. Will the github action automatically skip or continue with the tests, depending on whether or not networkx is installed? Or Should I add some indications in the test files?
Another question is that some of my tests require extra python libraries, for example I imported the library "galois" for generating the 'Slimfly; topology. This import is not built with SST, but only is required when running the test.
Is there any solution there?
You'll need to add an @unittest.skipIf similar to what the polar* topology tests do so that they don't try to execute if the libraries aren't found. Note that the testsuite_Default_merlin.py file also tries to load the required libraries so that they can check if they are in sys.modules.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.
I have added new tests for Anytopo, and all tests passed. So I think this PR is ready for review.
Sorry, with the SC conference last week, Thanksgiving this week and vacation next week, it will be a bit before I can dive deep into this PR. I took a quick glance through it and didn't see anything that immediately stood out as an issue. I'll revisit this the week of December 8th and do a complete review.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.
Sorry for the delay, things got a bit hectic during December. I will be looking at this PR this week, though it may take me a bit to get through since I'll have to remind myself of the details in the merlin code.
Hi, no worries, we are not in any hurry. And thanks for giving feedbacks on the code, I will try to fix the points you raised.
Recently I have been thinking about the design of this branch. Of course importing network topology from graph, and making the routers do source-routing, is probably necessary to be implemented in Merlin. But this whole concept of 'EndpointNIC with plugins', feels suitable to be an individual sst-element, similar to the 'rdmaNic' element? Implementing these in Merlin, may benefit its extensibility. For example, I recently implemented another small plugin (not in this branch) that dumps traffic traces from Merlin, it was really clean and efficient. I wonder whether it is even feasible to implement circuit-switching within this framework.... But on the other hand, with a lot of plugins, this framework might also make Merlin complicated/confusing to read for people just started to use this simulator... I don't know if that is a concern or not.
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.