Use of finally stack in recursive calls
We have now encountered two instances of the following type of bug:
A recursive function uses the finally stack, causing it to overflow when the recursion depth reaches a certain level:
- #2261
- #2386
The symptom is the fatal error "Finally stack too large: it contains 100 elements.", which can be confusing, as it seems to suggest a mismatch between FINALLY and FINALLY_CLEAN calls, but the issue is in fact an overuse of the stack.
These types of bugs are typically not caught by the test suite, as triggering them requires a large graph, sometimes a large graph with a specific structure. Otherwise the recursion won't be deep enough.
Can we try to pre-emptively identify such issues? Yes, clang-tidy can identify recursion:
CT='clang-tidy;--use-color;--checks=-*,misc-no-recursion' cmake .. -DCMAKE_C_CLANG_TIDY=$CT -DCMAKE_CXX_CLANG_TIDY=$CT
You may want to not analyse C++ files, as they typically come from borrowed code that doesn't use the finally stack.
With this method, I identified the following potential problem cases. We need to check each of these for whether they are susceptible to 'finally' stack overflow, and fix them, or add a comment about why there's no issue.
- [x]
local_scan_k_ecountandlocal_scan_1_ecountcall each other, but this turns out not to be a true recursion. - [x]
igraph_maximum_bipartite_matchingandigraph_i_maximum_bipartite_matching_weightedcall each other, but this turns out not to be a true recursion (see comment below). - [x]
igraph_maxflowandigraph_i_maxflow_undirectedcall each other, but this turns out not to be a true recursion (see comment below). - [x]
igraph_i_provan_shier_list_recursive, fixed in #2261. - [x]
igraph_i_maximal_independent_vertex_sets_backtrack, fixed in #2386 . - [ ]
igraph_i_graphletscalls itself.
Notes:
- The LAD code may be susceptible to this, but this is borrowed and rewritten code that implements a complex algorithm, and it difficult and risky to change. This should only be fixed with #540.
- There are recursive functions that traverse trees and do not use the finally stack and cannot cause problems, such as heap functions, GML tree traversal, tree graph layouts.
Case of igraph_maximum_bipartite_matching and igraph_i_maximum_bipartite_matching_weighted: this is not a true recursion. igraph_i_maximum_bipartite_matching_weighted() calculates an initial maximum matching on a set of tight edges; without going into details, this is always a subset of the real edges and the graph is treated as unweighted for the duration of the calculation of the initial matching. So, with a weighted graph, igraph_maximum_bipartite_matching() calls igraph_i_maximum_bipartite_matching_weighted(), which calculates the set of tight edges, calls back into igraph_maximum_bipartite_matching(), which then calls igraph_i_maximum_bipartite_matching_unweighted().
Case of igraph_maxflow() and igraph_i_maxflow_undirected(): this is not a true recursion either. igraph_maxflow() calls igraph_i_maxflow_undirected() for an undirected graph, which converts the graph into directed, and calls back to igraph_maxflow(). igraph_maxflow() will then not call igraph_i_maxflow_undirected() again because the graph is now directed.
I tested maxflow and graphlets empirically earlier today, and wasn't able to get either to crash. Graphlets are probably fine too. Thanks for looking into this!
igraph_i_graphlets() has a real recursion and theoretically it can cause an overflow. But the algorithm does not seem to be prone to deep recursion, so it hasn't come up in practice yet.
I now reduced the finally stack use of this function, but the problem can still occur in principle.
I won't spend more time on this, as the fix would take a lot of effort, and this concept of "graphlets" hasn't really caught on. The term "graphlets" is now more commonly used for an entirely unrelated idea.