chapel icon indicating copy to clipboard operation
chapel copied to clipboard

Fix an infinite recursion bug in query system

Open DanilaFe opened this issue 4 months ago • 0 comments

It's another query system bug episode!

The Bug

This time, the issue is so tricky that I have not been able to construct a query-system-only reproducer; it seems that the precise combination of elements from resolution-queries.cpp and scope-queries.cpp is necessary to trigger the bug.

In general, though, the trouble is with the isQueryRunning function. Queries in Dyno use isQueryRunning to avoid creating infinite recursion. The culprit here in particular is resolveVisibilityStmts and resolveVisibilityStmtsQuery. The non-query version of the function runs a check with isQueryRunning, and returns early if recursion would be encountered.

This is perfectly sound -- when you're not re-running queries. However, when the query system is checking if old results can be re-used, it will perform a bottom-up traversal of the dependency graph (see this query system documentation to understand what I mean in more detail). At this time, the "parent" query is not technically running.

What happens, then, is that during recomputation, resolveVisibilityStmts is called bottom up, meaning that the parent instance of resolveVisibilityStmtsQuery is not running (even though it is above the second invocation in the graph). resolveVisibilityStmtsQuery is then invoked when it wasn't invoked the previous time (since isQueryRunning returns false). This adds the query to the dependency graph, creating an (undetected) cycle. The cycle causes infinite recursion in the Dyno logic where the computational graph is assumed to be acyclic.

The Fix

The solution is to ensure that we know whether or not a query is notionally running at any given point; this could mean that it's actually running (and we're going top-down), or that we're traversing this query's dependency subgraph (in which case it should look like the query is running to maintain the veil over the query system). To do this, I added a new flag to query result entries, beingTestedForReuse.

I suspect I could've gotten away with some clever uses of lastChecked instead of adding a new flag. However, lastChecked has a lot of uses in the query system, and adding another use was nontrivial. Since I found it hard to think through the consequence of changes to lastChecked -- which, I hope, would mean anyone else reading this code for the first time would find it hard too -- I opted for the simpler solution and a documentation comment.

David's original reproducer is part of this PR to lock down this behavior (though it depends on the implementation of scope resolution, as opposed to being a "pure" query system test). I could not come up with a simple reproducer, and think it's not worth the time to keep trying.

Testing

  • [x] dyno tests

DanilaFe avatar Oct 08 '24 23:10 DanilaFe