dewolf icon indicating copy to clipboard operation
dewolf copied to clipboard

Common Subexpression Elimination has side effects on Restructuring

Open 0x6e62 opened this issue 3 years ago • 2 comments

What happened?

The restructuring phase expects all expressions to be fully propagated. However, CommonSubexpressionElimination runs first (due to it being implemented on the ssa cfg). This results in undesired side effects.

We could either try to implement CSE on the AST after the resturcturing process, or make the restructuring process follow definitions of values.

How to reproduce?

Bug can be observed in the sample test_goto.zip in the function test1_b

Before CSE is run, the CFG looks like this: 12771daa-366f-4c9b-a38f-c1ffb9d0a16e

However, during CSE we add the definition c#0 = var_10#1 + 0x1between the phi-function and the if-condition. This results in problems during restructuring: 67e415bd-3e1d-4e94-b210-d4a20142007b

After out of SSA, this line has been converted to var_1 = var_0 + 1 As a result, we currently can not restructure the loop as. The resulting code looks like this:

int test1_b() {
    int var_0;
    int var_1;
    var_0 = 10;
    while (true) {
        var_1 = var_0 + 1;
        if (var_0 <= 15) {
            printf("value is: %d\n", var_0);
            var_0 = var_1;
            continue;
        }
        printf("value of a: %d\n", var_0);
        if (var_1 > 19) {
            break;
        }
        var_0 = var_1;
    }
    printf("value of a final: %d\n", var_1);
    return 0;
}

Affected Binary Ninja Version(s)

2.4

0x6e62 avatar Jan 21 '22 10:01 0x6e62

When we do not run the CommonSubexpressionElimination stage, the CFG looks like this when out-of-SSA has run: 04438797-a2b2-4ccf-b76e-1c1a16bd404a As a result, we can restructure the for loop

int test1_b() {
    int var_0;
    int var_1;
    var_0 = 10;
    while (true) {
        for (var_0; var_0 <= 15; var_0++) {
            printf("value is: %d\n", var_0);
        }
        printf("value of a: %d\n", var_0);
        var_1 = var_0 + 1;
        if (var_0 + 1 > 19) {
            break;
        }
        var_0 = var_1;
    }
    printf("value of a final: %d\n", var_0 + 1);
    return 0;
}

0x6e62 avatar Jan 21 '22 10:01 0x6e62

If we reduce the CSE stage to existing subexpression elimination only, we can also resturcture the loop more gracefully: 218db924-422d-49ef-8436-daa4e9e4afed

int test1_b() {
    int var_0;
    var_0 = 10;
    do {
        for (var_0; var_0 <= 15; var_0++) {
            printf("value is: %d\n", var_0);
        }
        printf("value of a: %d\n", var_0);
        var_0++;
    }
    while (var_0 <= 19);
    printf("value of a final: %d\n", var_0);
    return 0;
}

Existing Subexpression Elimination on the cfg actually helps the restructuring process to identify for-loops

0x6e62 avatar Jan 21 '22 10:01 0x6e62