codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Python: Improve performance of FileNotClosed query by using basic block reachability

Open joefarebrother opened this issue 5 months ago • 6 comments
trafficstars

cc @nickrolfe , @aschackmull , @alexet

joefarebrother avatar Jun 02 '25 14:06 joefarebrother

Taking a step back and considering semantics: This seems like a clear-cut use-case for post-dominance. I.e. a file is always closed if the open is post-dominated by a close. So we're looking for opens that can reach a close, but aren't post-dominated. Is there any reason not to simplify the query implementation to that?

aschackmull avatar Jun 10 '25 13:06 aschackmull

Possibly would make sense to use dominance, just wasn't sure exactly how it worked. Would it work for detecting that an exception edge can cause the close to not be reached?

I suppose it would also behave differently with conditionals; i.e.

if use_stdin:
  file = sys.stdin
else:
  file = open(path)
try:
  line = file.readline()
finally:
  if not use_stdin:
    file.close()

currently we wouldnt alert there, filtering some potential FPs in which the conditional close is in fact correct, at the cost of maybe some FNs in more complex cases. Would a dominance based version alert there by default?

joefarebrother avatar Jun 13 '25 08:06 joefarebrother

First off, yes, dominance would handle exception edges. Second, you're right that we need to handle conditionals to avoid an FP in the example you give.

Consider the set of basic blocks that may lie on a path between open and close. What we're looking for is one such block, A, with a successor, B, that steps out of this set. If this was all we needed then post-dominance would give us the answer, but we need to allow certain A->B steps as ok, e.g. the false-successor in if not use_stdin above.

The ambitious approach would involve actually analysing the branch edge A->B to see if we could prove it irrelevant given the context of the file-open, but that's probably best delegated to a proper shared control-flow reachability library. So let's just use the heuristic that you already outlined, i.e. conditional branching is probably fine, whereas we're looking for exception edges as those that might indicate bugs.

To put this together we can define a willLikelyAlwaysReachClose predicate on basic blocks (presumably parameterised by the FileOpen we're reasoning about) and step backwards through the CFG with both immediate post-dominance steps, and conditional edges.

predicate willLikelyAlwaysReachClose(BasicBlock bb, FileOpen fo) {
  exists(FileClose fc | fileLocalFlow(fo, fc) and bb = fc.getBasicBlock())
  or
  exists(BasicBlock mid | willLikelyAlwaysReachClose(mid, fo) |
    // if `mid` will always reach a close, then the same holds for anything that `mid` post-dominates
    mid.immediatelyPostDominates(bb)
    or
    // if `mid` will always reach a close, and `bb` can conditionally branch to `mid` then we assume
    // that this branch will be taken when the close is necessary
    bb.getAConditionalSuccessor() = mid
  )
}

Finally we can check whether the open will reach a close: willLikelyAlwaysReachClose(fo.getBasicBlock(), fo).

aschackmull avatar Jun 13 '25 12:06 aschackmull

Makes sense Is it worth rewriting the query using these techniques now, or is the current approach good enough for the moment (it should resolve an OOM issue in DCA for FreeCAD)

joefarebrother avatar Jun 17 '25 12:06 joefarebrother

Is it worth rewriting the query using these techniques now, or is the current approach good enough for the moment (it should resolve an OOM issue in DCA for FreeCAD)

Up to you.

aschackmull avatar Jun 18 '25 09:06 aschackmull

In that case; I think I will stick with the current approach; prioritising resolving the OOM issue ASAP.

joefarebrother avatar Jun 20 '25 13:06 joefarebrother