tricore register 0 asserts
Work environment
| Questions | Answers |
|---|---|
| OS/arch/bits | Debian arm 64, MacOS AArch64, MacOS x86, Windows x86 etc. |
| Architecture | ppc, x86, cortexm, armv8 etc. |
| Source of Capstone | git clone, brew, pip, release binaries etc. |
| Version/git commit | v5.0.3, |
Instruction bytes giving faulty results
tricore disassembly doesnt match metadata inside. this is the disassembly:
252$ ./cstool/cstool tc162 9a000028
0 9a 00 add d15, d0, #0
0$ ./cstool/cstool -d tc162 9a000028
0 9a 00 add d15, d0, #0
ID: 31 (add)
op_count: 2
operands[0].type: REG = d0
.access: WRITE
operands[1].type: IMM = 0x0
.access: READ
Registers modified: d0
as you can see there are 3 operands, 1st and 2nd are registers, 3rd is immediate. but decoding shows only two.
Expected results
things not asserting if i parse the 2nd operand as a register
Assertion failed: (RegNo && RegNo < 61 && "Invalid register number!"), function getRegisterName, file TriCoreGenAsmWriter.inc, line 3603.
Steps to get the wrong result
With r2:
[0x00000000]> wx 9a000028
[0x00000000]> pd 2
Assertion failed: (RegNo && RegNo < 61 && "Invalid register number!"), function getRegisterName, file TriCoreGenAsmWriter.inc, line 3603.
Process 30566 stopped
this is not crashing with cstool because its ignoring the 2nd operand.
Pushing up this issue, breaking r2 TriCore features part ! Thanks guys !
-- EDIT the instruction add d15, d0, #0 is not valid for the TriCore architecture (TC1767 as ex)
In the TriCore architecture, the add instruction typically involves register-to-register operations and does not directly support an immediate value (like #0).
Maybe if we try with another instruction sets versio ?
EDIT2 It seems that changing tc version doesn't change anything. In tc162 IS, we can do an Add d0, d1, #0
@imbillow Could you take a look at this one please?
@Rot127 do you think it's possible to hotfix it in V5 too ? Thanks
Yes of course! Actually, let me change the milestone to v5.
more wrong stuff
$ ./cstool/cstool -d tc162 da00a0f5
0 da 00 mov d15, #0
ID: 233 (mov)
op_count: 1
operands[0].type: IMM = 0x0
.access: READ
Groups: (null)
2 a0 f5 mov.a a5, #0xf
ID: 230 (mov.a)
op_count: 2
operands[0].type: REG = a5
.access: WRITE
operands[1].type: IMM = 0xf
.access: READ
Registers modified: a5
0$
Looking at the tricore instruction set docs, it seems that capstone just does not consider registers that are implicit in the opcode as operands.
i.e. your first example, "9a000028" is actually just a 16-bit instruction "9a00" followed by an invalid instruction.
In "9a00", "9a" is the opcode for an operation that adds a 4-bit constant to a data register and stores the result in d15. The d15 output register is not encoded as an "f" anywhere, but is just the implicit output of the instruction with that opcode (9a):
Capstone then also does not show "d15" as an operand. The same thing is happening with the "more wrong stuff" examples.
I also would like to have this bug fixed. It leads to wrong information, like showing no written registers if the output registers are implicit. When searching for instructions that modify a specific register, I need that information to be available in the disassembly.
@Rot127 @trufae
This is mostly about what to do with those implicit register accesses. If we want to add those implicit register accesses to the instruction meta information then I need to do quite a bit of special handling. If it's just turning off incorrect assertions then that's pretty simple. So what should I do?
So from a user's perspective it would be best to add the meta-information to the instructions/opcodes.
Is there maybe something similar in other supported architectures, where some of the "special handling" code could be copied/adjusted from?
In x86, push cs would be an example:
$ cstool -d x32 0e
0 0e push cs
ID: 609 (push)
Prefix:0x00 0x00 0x00 0x00
Opcode:0x0e 0x00 0x00 0x00
rex: 0x0
addr_size: 4
modrm: 0x0
disp: 0x0
sib: 0x0
op_count: 1
operands[0].type: REG = cs
operands[0].size: 2
Groups: not64bitmode
Here capstone tells us that there is one operand, the cs register, which is implicit here.
@imbillow Sorry, for answering so late.
So mov.a is defined here: https://github.com/TriDis/llvm-tricore/blob/4bfc5ee073becdcf799978431a9c03045b1091a2/lib/Target/TriCore/TriCoreInstrInfo.td#L547
If all mov instructions implicitly set the d15 register, they should be added there like this (also check the parent or child classes):
let Defs = [D15], Uses = [] in {
class MOV_RR<bits<8> op1, bits<8> op2, string opstr,
RegisterClass outregClass, RegisterClass inregClass>
...
}
If you generate the tables again, you get implicitly defined regs. Also you can open a PR with those changes in the TriDis repo.
The generated tables changed a little. You maybe have to apply only relevant changes. E.g. like this:
git diff -U0 | grepdiff -E '<PATTERN>' --output-matching=hunk | git apply --cached --unidiff-zero
@csarn I am not sure if you understand you correctly. So here the general operand classification for Auto-Sync archs (TriCore is one of them):
explicit operands: Any operand you see in the asm text should be in the details. Only if you choose to get therealoperands for an alias instruction (viacs_tool -rorCS_OPT_DETAIL_REAL) the asm text and detail operands differ.implicit operands: Anything effectively used by the instruction, but not shown in the alias or real asm text (as thisD15access from above). Those ones are incs_detail->regs_read/write.
If there is an instruction which doesn't give these results, it is considered a bug.
@Rot127 Ok, then I mis-used the word "implicit". All the examples (tricore, and the x86 "push cs") have the relevant operands explicit in the asm text. I was thinking on opcode level, where the encoded instruction would have no bits indicating the operand, because for that operand there is a special opcode.
So you are confirming that this issue is actually a bug. Going back to the first example:
cstool -d tc162 9a00
0 9a 00 add d15, d0, #0
ID: 31 (add)
op_count: 2
operands[0].type: REG = d0
.access: WRITE
operands[1].type: IMM = 0x0
.access: READ
Registers modified: d0
the disassembled asm string "add d15, d0, #0" is correct, and shows 3 operands. The details only show the last two of those, "d15" is missing.
Yes, it is a bug. @imbillow Sorry, I didn't look at the code so far. Just recognized with the last comment what you meant. I would need to check. But if the Printer just prints the hard-coded string of the register name (e.g. "d15") you would need to add it in some fixup function.
I can take a look tomorrow and give you more details.
@imbillow Just checked it. Yeah this is this annoying problem of people hard coding operands in the mnemonic.
Check out functions AArch64_insert_detail_op_reg_at() and how it is used. You can do something like this.
@imbillow I started an attempt for this one here: https://github.com/capstone-engine/capstone/pull/2502
There are problems though. While we can figure out the registers at position >0 (by checking the bits as in the PR), we cannot easily figure out the d15 and a15 regs at position 0.
Because they are emitted with the mnemonic. So no bits indicate their presence.
Also the attempt in the PR is flawed, because it might add registers also to instructions which have randomly the correct bits set at these positions.
Now, I would propose to fix for these instructions the td files.
I think we can add the d15/a15 register as implicit write. Then check in the fixup function, if d15/a15 is in the implicit write list AND in the asm string.
If yes, we remove it from the list and add it as register at an index.
We can determine the index by counting the ,.
Super ugly and resource intense because we check strings. For every instructions. But I cannot come up with another idea currently. Except we want to go deep into LLVM logic, but I have not time for this unfortunately.
Another problem is the missing register access information. We maybe have to do a string search in the mnemonic again?
Or is there a way to check the instruction encoding? Do you know this?
I just saw you've solved the problem with the RzIL uplifting. So we could just copy the distinction from there.
But TriCore's RzIL code is also pretty much re-disassembled.
I think it might be more elegant to edit tricore's TableGen, if feasible.
However, these implicit registers may also be present in an memory operand.
In this case the metadata is completely wrong.
cstool -d tc162 c800
0 c8 00 ld.a a0, [a15]#0
ID: 165 (ld.a)
op_count: 1
operands[0].type: MEM
.mem.base: REG = a0
.mem.disp: 0x0
.access: WRITE
Registers read: a0
Registers modified: a0
Groups: (null)
Yeah, I figured this out later as well. Guess we really need to fix it in the td files.