Issue with soundness of HLIL control flow structuring
HLIL can produce unsound control flow structuring in some conditions.
Consider this MLIL code:
In MLIL everything looks correct:
Consider the case when 'i = 0the path meansiis assigned to5and then ultimately goes to instruction21`
Now in HLIL:
The control flow is a series of
if statements rather than if-else statements. In the case of i == 0 it meets the first condition and sets i = 5 and then can also satisfy the second condition too incorrectly setting var_20 = 1
Special Thanks to: Zao Yang and Stefan Nagy for their research in Decompiler Fuzzing for reporting this issue.
This is exposing a fundamental issue in control flow restructuring. We have desire to revisit the control flow restructuring algorithm to significantly reduce the number of conditions that get out of hand and become unreadable, which is also an issue with the current algorithm. This bug will probably have to be fixed alongside a refactor of the control flow restructuring itself.
A user reported a similar case:
- BN version: 4.0.4958 Windows
- Compiler: MSVC 2023 x64
Compile the following C code using cl /c /O2 test.c and disassemble the resulting binary in Binary Ninja:
bool test(int *input)
{
bool result;
if(input[0] == 1) {
result = true;
input[0] = 2;
if(input[1] == 1)
goto other;
} else {
other:
result = false;
}
return result;
}
This results in the following High Level IL output:
int64_t test(int32_t* arg1) {
1 int64_t rax;
2 if (*arg1 == 1)
3 rax = 1;
4 *arg1 = 2;
5 if (*arg1 != 1 || (*arg1 == 1 && arg1[1] == 1))
6 rax.b = 0;
7 return rax;
}
However, this output is not actually correct: if the function's input is { 1, 0 }, at line 4 in the HLIL, arg1[0] will be set to 2. Then, at line 5, the condition in the if statement evaluates to true (since arg1[0] now evaluates to 2), which in turn leads to line 6 being executed, and the whole function returning false. This differs from the actual behavior of the program (which should be returning true).
I see the fundamental issue is the body of the if statement is modifying the variables used in the condition, so it alters the result
Fixed in 4.1.5547