ghidra decompile "INT_SUB" as a addition because of added variable sign?

When I decompile a Macos AARCH64 binary, I found latest ghidra decompiles a "$U40880:4 = INT_SUB w8, 0x58:4" as "s[1] + 0xa8U" Screenshot from 2023-06-16 17-12-06 However, the corresponding source code checks whether s[1] is equal to0x58or 0x88, whose optimization should be s[1] - 0x58 & 0xdf. Moreover, I found that in Ghidra 10.3, though s[1] is decompiled in a mass, the number - 0x58 is decompiled correctly, when s[1] is considered as a uint: Screenshot from 2023-06-16 17-12-33 To make sure the potential overflow may cause an issue, I test the following code:

#include <stdio.h>
int test() {
	char temp3; 
	temp3 = getchar();
	if((temp3 + 0xa8U & 0xdf) != 0)
		printf("11111\n");
	return 0 ;
}
int test2() {
	char temp3; 
	temp3 = getchar();
	if((temp3 - 0x58 & 0xdf) != 0) 
		printf("22222\n");	
	return 0 ;
}
int main(int argc, char *argv[]){
	test2();
	test();
}

And have the following result:

friends@friends-OptiPlex-5070:~/muqi/trash$ gcc -O0 test_overflow.c 
friends@friends-OptiPlex-5070:~/muqi/trash$ ./a.out 
x
11111
friends@friends-OptiPlex-5070:~/muqi/trash$ ./a.out 
a
22222
11111
friends@friends-OptiPlex-5070:~/muqi/trash$ ./a.out 
X
11111

binary for testing is attached arm_coreutils_od_o.zip

Jun 16 '23 21:06 Muqi-Zou

I did some research and found it is because ghidra wrongly assigns 0xa8 as a byte (unsigned char) type. The process can be briefly described as followed:

Ghidra uses sub2add to convert s[1]-0x58 to s[1]+ 0x58 * -1.
Ghidra uses collapseConstants to convert 0x58*-1 to ffffffa8.
Ghidra uses propagatecopy to propagate the int_zext(ffffffa8) to its related varnode.
Ghidra suspects the signs of ffffffa8 as unsigned. (This is something I have no idea why) However, if I ban rule collapseConstants and leave 0x58*-1 as a expression there, Ghidra can correctly assign constant 0x58 as a signed char.

Jun 19 '23 19:06 Muqi-Zou

The decompiler cannot always recover the exact data-types from the original source code. In this case, the desired expression s[1] - 0x58 & 0xdf and the recovered expression s[1] + 0xa8 & 0xdf are equivalent. The decompiler chooses an unsigned data-type for the constant because the logical-and in the expression is usually associated with bit manipulation of unsigned values.

The functions test() and test2() in the sample code are testing separate characters because they each call get_char() independently. So the boolean tests can be different.

Sep 12 '23 20:09 caheckman

I am sorry, I provide a terrible testing example here, which may confuse you. The following example should be more obvious. If I use the following code:

int main(int argc, char *argv[]){
	char temp3; 
	temp3 = getchar();
	if(temp3 + 0xa8U != 0)
		printf("11111\n");	
	if(temp3 - 0x58 != 0) 
		printf("22222\n");
	return 0 ;
}

I will have result like followed:

friends@friends-OptiPlex-5070:~/muqi/trash$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
friends@friends-OptiPlex-5070:~/muqi/trash$ gcc -O0 test_overflow.c 
friends@friends-OptiPlex-5070:~/muqi/trash$ ./a.out 
X
11111
friends@friends-OptiPlex-5070:~/muqi/trash$ ./a.out 
x
11111
22222
friends@friends-OptiPlex-5070:~/muqi/trash$ ./a.out 
a
11111
22222

The semantic error behind these expressions is called integer promotion (We need to ctrl+f to find the definition of it, which is, the third occurrence/in the middle of this page.) Since both 0xa8U and 0x58 are int variables, their ranks are higher than the s[1]'s type, which is signed char. Hence, the result of operators, i.e., s[1] - 0x58 and s[1] + 0xa8U will both be promoted to an int variable. However, because 0x58 is a signed int and 0xa8U is an unsigned int, - in s[1] - 0x58 is promoted as a signed int minus, and + in s[1] + 0xa8U is promoted as an unsigned int addition.

Sep 13 '23 02:09 Muqi-Zou

https://github.com/NationalSecurityAgency/ghidra/issues/5758 is a similar example but the addition is incorrectly performed on the address of the object!

Sep 13 '23 11:09 Wall-AF

The decompiler is aware of the integer promotion. The bits that are different between s[1] - 0x58 and s[1] + 0xa8U viewed as integers are immediately discarded because of the & 0xdf. The full expressions are equivalent.

Sep 14 '23 16:09 caheckman

I think you are right, thanks for your time, and sorry for the confusion I caused. To check whether Ghidra is aware of the integer promotion, I tried patch &0xdf with a larger value, e.g., 0xdff. Ghidra gives me ((byte)s[1] - 0x58 & 0xdff) instead. I think this is a valid check for this issue.

Sep 15 '23 03:09 Muqi-Zou

ghidra ghidra copied to clipboard

decompile "INT_SUB" as a addition because of added variable sign?

ghidra
ghidra copied to clipboard