Steiner Tree Feature
The functionality that is being developed is calculating the Steiner tree in a graph. Steiner Tree is created on basis of subset of terminals in the graphs with the condition that length of the tree should be minimum.
We are following the approach suggested in this paper - https://arxiv.org/abs/1710.00668
Also, we are taking reference of an FPT algorithm known as Dreyfus- Wagner algorithm - https://onlinelibrary.wiley.com/doi/10.1002/net.3230010302
Edit by @szhorvat:
TODO list to be completed only after the first review:
- [ ] interruptability
- [ ] functions.yaml
- [ ] documentation
- [ ] checks on input parameters
- [ ] overflow checks
This is a good starting point. I closed the other PR (the comments won't disappear).
The next step now is to fix all the errors that prevent the code from compiling. You will not be able to check if it works correctly until it compiles.
Are you comfortable with compiling igraph on your own computer without these changes? Do you know how to run the tests? You should be using the very same steps to compile and test the function implemented in this PR.
This is a good starting point. I closed the other PR (the comments won't disappear).
The next step now is to fix all the errors that prevent the code from compiling. You will not be able to check if it works correctly until it compiles.
Are you comfortable with compiling igraph on your own computer without these changes? Do you know how to run the tests? You should be using the very same steps to compile and test the function implemented in this PR.
Thanks. I was able to compile the code using the instructions which specifies the usage for cmake and later the command cmake --build
I am however, confused about running test case. I did create create a test case and updated the CmakeLists.txt but running it and understanding why the CI is failing is proving little difficult for me.
Earlier there were compile time errors but at present the code ( including the Steiner tree change) is getting build without any errors.
Actually, the code does not build without errors. This is the first reason why the CI is failing, well before the tests. Here's the first error due to a type mismatch: you are sometimes using int and sometimes igraph_integer_t, which are not the same.
https://dev.azure.com/igraph-team/igraph/_build/results?buildId=2196&view=logs&j=9365e2c0-1fe2-55d7-9193-5883fccc1323&t=06be1443-af2e-53a2-9221-964df5414162&l=931
Can you fix all the compiler errors, which you can see in the output of the CI runs?
Generally, all integers used as indices (indexing into vectors, vertex and edges IDs) must be consistently igraph_integer_t.
Can you please format the source files for readability? Use consistent indentation. Your editor should have an auto-format command.
Eventually, the formatting will need to match that of the rest of igraph, but for now, let's just have something that is consistent and readable, and we can worry matching to the rest of the codebase late.
Thanks @szhorvat
I made the changes that you had suggested. Could you review the issue that is caused at IGRAPH_ASSERT macro?
For next steps, I have something that I would need to discuss with you -
The igraph_steiner_dreyfus_wagner function calculates the minimum length of the Steiner tree for smaller graph as it's FPT algorithm for NP-complete problem. Should the code be designed in such a way that it outputs tree as result or would output have tree and minimum length?
Should the code be designed in such a way that it outputs tree as result or would output have tree and minimum length?
It would be nice if the function could (optionally) provide both.
Let's first make sure that the code compiles, then that it works. We can do tweaks to it later.
@szhorvat - Thanks for the suggestions. I made some silly mistakes in the test case file. sorry about that. For building the code I'm using the following commands -
$ cmake --build . $ cmake --build . --target check
Which commands should I run to check the errors and resolve them locally? I will keep a note of the linux_static_vendored when the tests are run here on the platform.
Yes, these commands are correct. If you use them you should see the test failing, just like on the CI right now. Does the test not fail for you locally?
@Nachiket18 Let us know if you need more help with this. The immediate goal is to get the test suite to pass, and to make sure that you can run the tests locally on your computer.
I don't want to put too much pressure on you, so if you feel that you have no time right now, just let us know.
@szhorvat - I made the changes that you had suggested. I was able to build the code using the instructions from the official documentation. However, during the target check there was error generated.
177/424 Test #177: test::igraph_steiner_tree_fpt Test exited abnormally with error: Segmentation fault
Normally, I would add print statements to debug the code but for this project what should be the way to resolve it?
Either add printf statements to the tests if that's convenient for you and run it manually from the command line (the test should be built in test/test_igraph_steiner_tree_... in the build folder -- just make sure you rebuild if after adding the printf statements), or hook it up to a debugger and run through it step by step. I don't know what IDE / OS you are using, I usually use gdb or lldb from the command line on macOS, but I'm oldschool :)
Resolved the conflicts with the upstream develop branch, FYI.
I haven't checked the PR thoroughly yet so this is not a full review, but one thing that surely needs to be changed is that we have switched to using igraph_vector_int_t for variables that represents vertex indices so the function needs to be changed to use igraph_vector_int_t for vectors containing vertex indices. As for the distance matrix, that can probably stay as igraph_matrix_t because it can contain floating-point numbers in the weighted case, but infinite distances should be represented with IGRAPH_INFINITY and not INT_MAX.
FYI I resolved conflict with develop again and made a minor fix to allow it to build again. You should pull the changes.
If you haven't yet, take a look at this wiki page and the links within: https://github.com/igraph/igraph/wiki/Quickstart-for-new-contributors
As for debugging, building with AddressSanitizer (ASan) is useful. See here: https://github.com/igraph/igraph/wiki/Using-sanitizers-to-find-bugs
The linux_static_vendored CI run has ASan enabled and will give more informative error messages.
https://dev.azure.com/igraph-team/igraph/_build/results?buildId=3578&view=logs&j=4c3a8381-b9de-5463-904f-918810769be8&t=e71559a0-54d3-55f3-64cc-74b67ad70398&l=400
Let us know if you have more questions.
@szhorvat - Thanks. I made the changes that were causing segmentation fault, changed the data type to 'igraph_vector_int_t' for steiner_terminals and also updated the test case. The test case is failing even though it passes through the function in the test case and fails in the end. We are looking at the results from linux_static_vendored pipeline and would try to fix it.
Codecov Report
Merging #1939 (040d243) into master (a229509) will increase coverage by
0.05%. The diff coverage is93.63%.
@@ Coverage Diff @@
## master #1939 +/- ##
==========================================
+ Coverage 83.35% 83.40% +0.05%
==========================================
Files 373 374 +1
Lines 61251 61486 +235
==========================================
+ Hits 51054 51281 +227
- Misses 10197 10205 +8
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/random/rng_pcg32.c | 100.00% <ø> (ø) |
|
| src/internal/glpk_support.c | 56.25% <50.00%> (ø) |
|
| src/paths/johnson.c | 90.47% <80.76%> (+0.19%) |
:arrow_up: |
| src/paths/steiner.cpp | 93.85% <93.85%> (ø) |
|
| src/community/fluid.c | 89.58% <100.00%> (ø) |
|
| src/community/label_propagation.c | 95.18% <100.00%> (ø) |
|
| src/community/leading_eigenvector.c | 84.68% <100.00%> (ø) |
|
| src/community/leiden.c | 97.46% <100.00%> (ø) |
|
| src/connectivity/components.c | 94.91% <100.00%> (+0.01%) |
:arrow_up: |
| src/core/heap.pmt | 92.52% <100.00%> (-0.21%) |
:arrow_down: |
| ... and 13 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update a229509...040d243. Read the comment docs.
infinite distances should be represented with
IGRAPH_INFINITYand notINT_MAX.
Just highlighting this comment from @ntamas, as this needs to be addressed.
There are some parts of the algorithm that are not covered by the tests. Can you make sure that everything is covered, except some checks on input parameters or other error checks?
Let us know when this is ready for the first round of review.
infinite distances should be represented with
IGRAPH_INFINITYand notINT_MAX.Just highlighting this comment from @ntamas, as this needs to be addressed.
There are some parts of the algorithm that are not covered by the tests. Can you make sure that everything is covered, except some checks on input parameters or other error checks?
Let us know when this is ready for the first round of review.
Yes. We would write test case for checking the logic of the algorithm by printing and comparing the value for minimum steiner tree distance with one that's expected.
One more comment is that it will be useful to merge develop into this branch again, to prevent the two branches from diverging too much. This should also fix the coverage report, which incorrectly claims that this PR reduces coverage in many files.
It's great to see this PR moving forward.
To reiterate, the immediate goals are:
- [ ] Merge the
developbranch into this branch. - [ ] Add expected test output, and manually verify that it is correct.
- [ ] Fix test failures on all platforms. Let us know if it's unclear how to deal with a failure.
- [ ] Ensure full test coverage of all parts of the code that implement the algorithm. You can see coverage here It is not necessary to cover error checks or debug print functions, just the core algorithm.
- [ ] Add documentation. It does not need to be polished or well-written, but it should be complete (i.e. discuss all relevant details of the function) and correct (match what the function does).
Before the review, it would be great to reformat the code to fit with the rest of the igraph codebase. We use an editorconfig file to help with this. If you're not sure how to do this efficiently, let me know and I will run it through an auto-formatter.
Once these are done, we can start an earnest review.
@szhorvat I tried to fix as many issues as possible and am at my last 2 as per what I see the last few ones deal with
- macOS -> Johnsons being deprecated which causes Mac to fail at this moment.
- Windows(both) - > it says the
minfunction doesn't belong to the namespacestdbut when I click the definition it says it is and I don't know how to change it. And this is the only error left in windows at this moment.
Do you know how to deal with this cross-compatibility before I move on with the functionality and its test case? Thanks
When a function is deprecated, look it up in the sources and see what it was replaced with. In this case it is igraph_distances_johnson().
As for min(), let us look up which header declares it: https://en.cppreference.com/w/cpp/algorithm/min It is <algorithm>. With some of the standard C++ library implementations, some other headers may declare min as well, which is why you see no failure on Windows. But only <algorithm> is guaranteed to declare it, thus the only standards-compliant way to proceed is to include it.
When a function is deprecated, look it up in the sources and see what it was replaced with. In this case it is
igraph_distances_johnson().As for
min(), let us look up which header declares it: https://en.cppreference.com/w/cpp/algorithm/min It is<algorithm>. With some of the standard C++ library implementations, some other headers may declareminas well, which is why you see no failure on Windows. But only<algorithm>is guaranteed to declare it, thus the only standards-compliant way to proceed is to include it.
Ok, thanks . i thought it was completely removed from the igraph library because i had a similar situation in my C class where they said it was completely removed. And for min, it took me to the functional definition so I thought it was already added to the headers. I'll update the changes and push it soon.
I updated the TODO checklist above. When all these are done, just click the ready for review button.
Regarding the recently added igraph_i_directed_t mode: anything that has _i_ in it should not appear in public API. There are two typical ways to control the handling of directedness. One is a igraph_bool_t directed argument: when false, edge directions should be ignored. The other is an igraph_neimode_t mode argument. This is used when the result would change is the edge directions are reversed, and can be used to treat a directed graph either as undirected (mode = IGRAPH_ALL) or as reversed (mode = IGRAPH_IN).
igraph_i_directed_t is just a dummy name to allow for the IGRAPH_DIRECTED=1 and IGRAPH_UNDIRECTED=0 enum values. These are used for readability (i.e. not for functional reasons) when passing values to igraph_bool_t directed arguments. fun(g, 1) is quite cryptic and one might need to look up the meaning of the second argument. fun(g, IGRAPH_DIRECTED) is immediately clear.
We were able to build and test the functionality. Our test case includes a graph given in the research paper and output that code is generating is same as what it's mentioned in the paper. In order to ensure correctness, how many such test cases need to be implemented ? Meanwhile, I'll start working on documentation.
There's a wiki page that has hints on what to pay attention to when testing, as well as a few other things that we'll need to get done eventually.
https://github.com/igraph/igraph/wiki/Checklist-for-new-%28and-old%29-functions
- Consider the case of 0 and 1 vertex graphs. What is a reasonable result and does the function return that result?
- What if the input is disconnected?
- Does your implementation handle self-loops and multi-edges? Self-loops are easy as they can be ignored, but multi-edges are not. If the current implementation does not handle them, can be changed to support them?
- Test both with and without providing weights
- Are there any other special cases that need to be considered?
- etc.
Don't forget to merge the develop branch into this one.
Also, please go through the review comments and click "Resolve conversation" on those which have been addressed. This way we can keep track of what has been done and what still needs to be completed.
It looks like the function currently only returns the total weights of the Steiner tree. Can you return the tree itself as well as a vector of edge indices, similar to how igraph_minimum_spanning_tree() does it?
It looks like the function currently only returns the total weights of the Steiner tree. Can you return the tree itself as well as a vector of edge indices, similar to how
igraph_minimum_spanning_tree()does it?
Yes. We are figuring out how we are going to do.