codeql Treat operands of `Phi` instructions as escaped

Note: contains a few IR diffs that I haven't fully investigated.

During escape analysis, we currently allow an address to be used as an operand of a Phi instruction without escaping, as long as the result of that Phi instruction does not escape. This is probably correct for the technical definition of "escape", since we would still be seeing all uses of that address even after it propagates through the Phi. However, when consuming the results of escape analysis (in SSA construction), we really treat "escapes" as meaning "there might be a memory access that accesses this allocation without us being able to prove that the memory access only accesses this allocation." For example:

int x;
int y;
int* p;
if (a < b) {
  p = &x;
} else {
  p = &y;
}
*p = 5;

Neither &x nor &y technically "escapes", because they are both consumed only at the store to *p. However, *p might access either of the two locations, which means that we need to treat both x and y as aliased locations, assigned to the same virtual variable.

The expedient thing to do is to treat all operands of a Phi instruction as escaped, since anything that might access memory via the result of the Phi could be accessing any of multiple allocations, which is what I've done in this PR.

If we wanted to be more precise, we could stick with our original definition of "escapes" meaning "we can't see all potential consumers of this address". We could then define a "consumes escaped" property for each memory access, which holds if we can't see all possible producers of the accessed memory address; this is essentially the dual of "escapes". We would then group all allocations into equivalence classes as follows:

Two allocations A and B are in the same class if any of the following hold:

A and B are both escaped (all escaped memory is in the same class)
There exists a memory access for which both A and B are traceable sources of the base operand of the memory access.
There exists a memory access where A is a traceable source of the base operand of the memory access, that same memory access also "consumes escaped", and B is escaped.

Each equivalence class would get a unique virtual variable. This is almost what we have today. We already have a virtual variable for each never-aliased, never-escaped allocation. We also already have a single virtual variable containing all escaped allocations. The difference is that, with my fix from this PR, any allocation that could be one of mulitple targets of a memory access is lumped into the "all escaped" virtual variable. With the more precise approach, we could have virtual variables that contained multiple non-escaped allocations, distinct from the "all escaped" virtual variable.

I suspect that the more precise approach is not worth the effort, though.

May 09 '24 15:05 dbartol

I suspect that the more precise approach is not worth the effort, though.

After reducing some of the failing range analysis tests I'm not so sure about this. Consider this:

void maybe_modify_escaped();
void range(int);

void test(int n, bool b)
{
    maybe_modify_escaped();

    if (n >= 5)
    {
        range(n);
    }

    b ? n : n; // generates a phi instruction
}

since n now escapes we can't even infer that n >= 5 since the loads on n >= 5 and range(n) read from {AllAliasedMemory} 😭 So I think we do need the more precise approach you've suggested (or simply stop using the IR-based SSA for the semantic range analysis library)

Jun 24 '24 10:06 MathiasVP

Superseded by https://github.com/github/codeql/pull/16907. Closing.

Nov 19 '24 08:11 andersfugmann