codeql
codeql copied to clipboard
Python: Improve performance of FileNotClosed query by using basic block reachability
cc @nickrolfe , @aschackmull , @alexet
Taking a step back and considering semantics: This seems like a clear-cut use-case for post-dominance. I.e. a file is always closed if the open is post-dominated by a close. So we're looking for opens that can reach a close, but aren't post-dominated. Is there any reason not to simplify the query implementation to that?
Possibly would make sense to use dominance, just wasn't sure exactly how it worked. Would it work for detecting that an exception edge can cause the close to not be reached?
I suppose it would also behave differently with conditionals; i.e.
if use_stdin:
file = sys.stdin
else:
file = open(path)
try:
line = file.readline()
finally:
if not use_stdin:
file.close()
currently we wouldnt alert there, filtering some potential FPs in which the conditional close is in fact correct, at the cost of maybe some FNs in more complex cases. Would a dominance based version alert there by default?
First off, yes, dominance would handle exception edges. Second, you're right that we need to handle conditionals to avoid an FP in the example you give.
Consider the set of basic blocks that may lie on a path between open and close. What we're looking for is one such block, A, with a successor, B, that steps out of this set. If this was all we needed then post-dominance would give us the answer, but we need to allow certain A->B steps as ok, e.g. the false-successor in if not use_stdin above.
The ambitious approach would involve actually analysing the branch edge A->B to see if we could prove it irrelevant given the context of the file-open, but that's probably best delegated to a proper shared control-flow reachability library. So let's just use the heuristic that you already outlined, i.e. conditional branching is probably fine, whereas we're looking for exception edges as those that might indicate bugs.
To put this together we can define a willLikelyAlwaysReachClose predicate on basic blocks (presumably parameterised by the FileOpen we're reasoning about) and step backwards through the CFG with both immediate post-dominance steps, and conditional edges.
predicate willLikelyAlwaysReachClose(BasicBlock bb, FileOpen fo) {
exists(FileClose fc | fileLocalFlow(fo, fc) and bb = fc.getBasicBlock())
or
exists(BasicBlock mid | willLikelyAlwaysReachClose(mid, fo) |
// if `mid` will always reach a close, then the same holds for anything that `mid` post-dominates
mid.immediatelyPostDominates(bb)
or
// if `mid` will always reach a close, and `bb` can conditionally branch to `mid` then we assume
// that this branch will be taken when the close is necessary
bb.getAConditionalSuccessor() = mid
)
}
Finally we can check whether the open will reach a close: willLikelyAlwaysReachClose(fo.getBasicBlock(), fo).
Makes sense Is it worth rewriting the query using these techniques now, or is the current approach good enough for the moment (it should resolve an OOM issue in DCA for FreeCAD)
Is it worth rewriting the query using these techniques now, or is the current approach good enough for the moment (it should resolve an OOM issue in DCA for FreeCAD)
Up to you.
In that case; I think I will stick with the current approach; prioritising resolving the OOM issue ASAP.