dewolf
dewolf copied to clipboard
Common Subexpression Elimination has side effects on Restructuring
What happened?
The restructuring phase expects all expressions to be fully propagated.
However, CommonSubexpressionElimination
runs first (due to it being implemented on the ssa cfg).
This results in undesired side effects.
We could either try to implement CSE on the AST after the resturcturing process, or make the restructuring process follow definitions of values.
How to reproduce?
Bug can be observed in the sample test_goto.zip in the function test1_b
Before CSE is run, the CFG looks like this:
However, during CSE we add the definition c#0 = var_10#1 + 0x1
between the phi-function and the if-condition.
This results in problems during restructuring:
After out of SSA, this line has been converted to var_1 = var_0 + 1
As a result, we currently can not restructure the loop as. The resulting code looks like this:
int test1_b() {
int var_0;
int var_1;
var_0 = 10;
while (true) {
var_1 = var_0 + 1;
if (var_0 <= 15) {
printf("value is: %d\n", var_0);
var_0 = var_1;
continue;
}
printf("value of a: %d\n", var_0);
if (var_1 > 19) {
break;
}
var_0 = var_1;
}
printf("value of a final: %d\n", var_1);
return 0;
}
Affected Binary Ninja Version(s)
2.4
When we do not run the CommonSubexpressionElimination
stage, the CFG looks like this when out-of-SSA has run:
As a result, we can restructure the for loop
int test1_b() {
int var_0;
int var_1;
var_0 = 10;
while (true) {
for (var_0; var_0 <= 15; var_0++) {
printf("value is: %d\n", var_0);
}
printf("value of a: %d\n", var_0);
var_1 = var_0 + 1;
if (var_0 + 1 > 19) {
break;
}
var_0 = var_1;
}
printf("value of a final: %d\n", var_0 + 1);
return 0;
}
If we reduce the CSE stage to existing subexpression elimination only, we can also resturcture the loop more gracefully:
int test1_b() {
int var_0;
var_0 = 10;
do {
for (var_0; var_0 <= 15; var_0++) {
printf("value is: %d\n", var_0);
}
printf("value of a: %d\n", var_0);
var_0++;
}
while (var_0 <= 19);
printf("value of a final: %d\n", var_0);
return 0;
}
Existing Subexpression Elimination on the cfg actually helps the restructuring process to identify for-loops