dewolf
dewolf copied to clipboard
[Restructuring] Copy return statements to reduce region size
Proposal
Sometimes, the decompiler does not terminate because the regions we restructure are too large. Furthermore, large regions often have more complicated structures. One way to reduce the region size and to improve the readability is to copy return statements. (This is also done by IDA and Ghidra.)
Consider function lgetline
in exercise2.zip
Before the restructuring the cfg looks at follows:
First, the restructuring restructures the loop region consisting of the nodes 1, 2, 4, 8
.
During the loop-successor refinement, we add the nodes 3, 5, 6, 9, 7, 11, 10
to the loop region.
Thus, the loop region consists of all nodes except node 0
.
Currently, this leads to the following output:
int lgetline(void * arg1, int arg2) {
int var_1;
int var_2;
int var_3;
void * var_0;
var_2 = 0;
while (true) {
var_0 = arg1 + var_2;
var_3 = var_2 + 1;
if (var_2 < arg2 - 1) {
var_1 = getchar();
if (var_1 == 10) {
*var_0 = (void *)var_1;
*(arg1 + var_3)/*arg1[var_3]*/ = 0x0;
arg2 = var_3;
}
else {
if (var_1 != -1) {
*var_0 = (void *)var_1;
var_2 = var_3;
continue;
}
*var_0 = 0x0;
arg2 = var_2;
}
}
else {
if (var_1 == 10) {
*var_0 = (char *)var_1;
var_2 = var_3;
}
*(arg1 + var_2)/*arg1[var_2]*/ = 0x0;
arg2 = var_2;
}
return arg2;
}
}
Other decompilers like Ghidra produce the following output by coping the return statement:
int lgetline(char *s,int lim)
{
int state;
int i;
int c;
i = 0;
while( true ) {
if (lim + -1 <= i) {
if (c == 10) {
s[i] = '\n';
i = i + 1;
}
s[i] = '\0';
return i;
}
c = getchar();
if (c == 10) break;
if (c == -1) {
s[i] = '\0';
return i;
}
s[i] = (char)c;
i = i + 1;
}
s[i] = (char)c;
s[i + 1] = '\0';
return i + 1;
}
Approach
Try to implement a better version of finding loop successors and copy return statements if this helps to add fewer nodes to the loop region.
In the above example, even if we add the return statement to the basic blocks 5 and 9, we have to find a way to figure out where we start with the loop refinement. If we first consider the successor 3, then we still have the whole region.
The optimal outcome for the above example would be to move the return statement in basic block 10 to the basic blocks 9 and 5, remove the edges to basic block 10, and have basic block 3 as a unique successor. This would lead to a much nicer result.