ghidra
ghidra copied to clipboard
ARM: LDRHT instruction has incomplete/incorrect constructor
Describe the bug
The ldrht instruction is specified in the sleigh as follows:

However, addrmode3 does not support the combination of P24=0 and either of c2122=1 or c2122 = 3

To Reproduce
The (little endian) bytes b2 50 f4 e0 should disassemble to ldrht r5, [r4] #0x2 but fail to disassemble
I extended addrmode3 as follows:
addrmode3: [rn],-rm is P24=0 & U23=0 & c2122=1 & rn & c0811=0 & c0707=1 & c0404=1 & rm
{
local tmp=rn; rn=rn-rm; export tmp;
}
addrmode3: [rn],"#"^off8 is P24=0 & U23=1 & c2122=3 & rn & immedH & c0707=1 & c0404=1 & immedL
[ off8=(immedH<<4)|immedL;]
{
local tmp=rn; rn=rn + off8; export tmp;
}
It 'fixes' the LDRHT instruction, but I don't know enough about ARM to say if parking it in addrmode3 is appropriate
I think that it can be fixed without adding new constructors for addrmode3. Instead it is enough to tweak the constraints in constructors of post-indexed variations: https://github.com/NationalSecurityAgency/ghidra/blob/6842712129b8da45077bb8c5049e607d685f4dea/Ghidra/Processors/ARM/data/languages/ARMinstructions.sinc#L1024-L1044
In the first two constructors constraint c2122=2 can be changed to c2222=1 to allow for any value of bit 21, since it won't be checked. Similarly, in the next two constructors constraint c2122=0 can be changed to c2222=0.
I think that it can be fixed without adding new constructors for
addrmode3. Instead it is enough to tweak the constraints in constructors of post-indexed variations:https://github.com/NationalSecurityAgency/ghidra/blob/6842712129b8da45077bb8c5049e607d685f4dea/Ghidra/Processors/ARM/data/languages/ARMinstructions.sinc#L1024-L1044
In the first two constructors constraint
c2122=2can be changed toc2222=1to allow for any value of bit 21, since it won't be checked. Similarly, in the next two constructors constraintc2122=0can be changed toc2222=0.
That is correct (alternatively the P24=1 restriction on the c2122=1 and c2122=3 constructors can be dropped). I intentionally split them up as I'm not sure if extending the constructors like this is sound (i.e. other previously invalid disassembly might now 'successfully' disassemble when they should not). In that case it would probably make more sense to move these to a new addrmodeX.
Good find. Definitely an issue here. It impacts ldrht, ldrsbt, ldrsht and strht. However ldrbt and strbt are handled properly in addrmode2, so they're fine.
Sleigh uses a syntax tree hierarchy to decide which constructors to use and resolves them from a top-down manner. Because these instructions have P24=0 & W21=1 set, relaxing the constraints on addrmode3 to use c2222={0,1}should be acceptable and not cause a disassembly conflict between the instructions. addrmode3 is only used for this subset of load/store instructions and the related encodings, the only instructions of concern here are ldrd and strd which define P=0, W=1 as unpredictable. Depending on if we want to avoid unpredictable disassembly, we may go either way on it.