ghidra
ghidra copied to clipboard
Weird 8051 arithmetic, architecture specific ActionDatabase::universalAction?
Hi! Lately I have been playing around with analysing embedded Intel 8051 firmwares. This is an 8 bit architecture, and as such it is very common (even for firmware code) to operate on 16 bit numbers using multiple arithmetic instructions.
However, Ghidra often produces barely legible decompilation for these operations, such as:
pbVar1 = (byte *)CONCAT11('|' - (((0xb7U < (byte)(param_3 * '\x02')) << 7) >> 7), param_3 * '\x02' + 0x48);
*pbVar1 = param_1;
pbVar1[1] = param_2;
(Some research shows that others have also reported this issue here. My issue is more about asking for guidance on fixing it)
I have been able to trace this behavior to RuleSignShift
using DecompVis - an external tool to visualize decompiler action's results. This rule rewrites the right shift part of the bit accessing pattern of the processor's status register to CPUI_INT_SRIGHT
before other rules have a chance to cancel the otherwise redundant shifting.
Disabling this rule produces an slightly more legible result:
pbVar1 = (byte *)CONCAT11((0xb7U < (byte)(param_3 * '\x02')) + '|',param_3 * '\x02' + 0x48);
*pbVar1 = param_1;
pbVar1[1] = param_2;
However the true breakthrough comes with disabling RuleCarryElim
:
pbVar1 = (byte *)CONCAT11(CARRY1(param_3 * '\x02',0x48) + '|',param_3 * '\x02' + 0x48);
*pbVar1 = param_1;
pbVar1[1] = param_2;
At this point I was able to author my own rule which detects this specific arrangement of CONCAT11
and CARRY1
being used together, and rewrites it to a simple addition between a CONCAT11
and the pieced constants:
*(byte *)((param_3 * '\x02') + 0x7c48) = param_1;
*(byte *)((param_3 * '\x02') + 0x7c49) = param_2;
This works most of the time, however sometimes the decompiler does not recognise the constant as a memory reference. (Any tips on how this could be achieved would be appreciated!)
I would like to contribute my changes for this rather obscure processor. Based on my so far limited understanding, I think the most future proof way of doing so would be to parse ActionDatabase::universalAction from a separate XML specification file. This could be attached either to the program as a specification extension, or could be included somehow into the various processor description files.
Having control over the universal action in such a way would allow disabling/restructuring the currently hardcoded rules into two separate instruction rewriting groups which would ensure that multibyte arithmetic operations are fully recovered before the disabled rules are applied.
Please provide some guidance/tips on an approach that would increase my chances of such a PR being accepted.