capstone icon indicating copy to clipboard operation
capstone copied to clipboard

tricore register 0 asserts

Open trufae opened this issue 1 year ago • 18 comments

Work environment

Questions Answers
OS/arch/bits Debian arm 64, MacOS AArch64, MacOS x86, Windows x86 etc.
Architecture ppc, x86, cortexm, armv8 etc.
Source of Capstone git clone, brew, pip, release binaries etc.
Version/git commit v5.0.3,

Instruction bytes giving faulty results

tricore disassembly doesnt match metadata inside. this is the disassembly:

252$ ./cstool/cstool tc162  9a000028
 0  9a 00  add	d15, d0, #0
0$ ./cstool/cstool -d tc162  9a000028
 0  9a 00  add	d15, d0, #0
	ID: 31 (add)
	op_count: 2
		operands[0].type: REG = d0
			.access: WRITE
		operands[1].type: IMM = 0x0
			.access: READ
	Registers modified: d0

as you can see there are 3 operands, 1st and 2nd are registers, 3rd is immediate. but decoding shows only two.

Expected results

things not asserting if i parse the 2nd operand as a register

Assertion failed: (RegNo && RegNo < 61 && "Invalid register number!"), function getRegisterName, file TriCoreGenAsmWriter.inc, line 3603.

Steps to get the wrong result

With r2:

[0x00000000]> wx 9a000028
[0x00000000]> pd 2
Assertion failed: (RegNo && RegNo < 61 && "Invalid register number!"), function getRegisterName, file TriCoreGenAsmWriter.inc, line 3603.
Process 30566 stopped

this is not crashing with cstool because its ignoring the 2nd operand.

trufae avatar Sep 08 '24 14:09 trufae

Pushing up this issue, breaking r2 TriCore features part ! Thanks guys !

-- EDIT the instruction add d15, d0, #0 is not valid for the TriCore architecture (TC1767 as ex)

In the TriCore architecture, the add instruction typically involves register-to-register operations and does not directly support an immediate value (like #0).

Maybe if we try with another instruction sets versio ?

EDIT2 It seems that changing tc version doesn't change anything. In tc162 IS, we can do an Add d0, d1, #0

cqke avatar Sep 08 '24 14:09 cqke

@imbillow Could you take a look at this one please?

Rot127 avatar Sep 08 '24 19:09 Rot127

@Rot127 do you think it's possible to hotfix it in V5 too ? Thanks

cqke avatar Sep 09 '24 08:09 cqke

Yes of course! Actually, let me change the milestone to v5.

Rot127 avatar Sep 09 '24 08:09 Rot127

more wrong stuff

$ ./cstool/cstool -d tc162 da00a0f5
 0  da 00  mov	d15, #0
	ID: 233 (mov)
	op_count: 1
		operands[0].type: IMM = 0x0
			.access: READ
	Groups: (null)

 2  a0 f5  mov.a	a5, #0xf
	ID: 230 (mov.a)
	op_count: 2
		operands[0].type: REG = a5
			.access: WRITE
		operands[1].type: IMM = 0xf
			.access: READ
	Registers modified: a5

0$

trufae avatar Sep 09 '24 16:09 trufae

Looking at the tricore instruction set docs, it seems that capstone just does not consider registers that are implicit in the opcode as operands. i.e. your first example, "9a000028" is actually just a 16-bit instruction "9a00" followed by an invalid instruction. In "9a00", "9a" is the opcode for an operation that adds a 4-bit constant to a data register and stores the result in d15. The d15 output register is not encoded as an "f" anywhere, but is just the implicit output of the instruction with that opcode (9a): image

Capstone then also does not show "d15" as an operand. The same thing is happening with the "more wrong stuff" examples.

I also would like to have this bug fixed. It leads to wrong information, like showing no written registers if the output registers are implicit. When searching for instructions that modify a specific register, I need that information to be available in the disassembly.

csarn avatar Sep 16 '24 15:09 csarn

@Rot127 @trufae

This is mostly about what to do with those implicit register accesses. If we want to add those implicit register accesses to the instruction meta information then I need to do quite a bit of special handling. If it's just turning off incorrect assertions then that's pretty simple. So what should I do?

b1llow avatar Sep 18 '24 14:09 b1llow

So from a user's perspective it would be best to add the meta-information to the instructions/opcodes. Is there maybe something similar in other supported architectures, where some of the "special handling" code could be copied/adjusted from? In x86, push cs would be an example:

$ cstool -d x32 0e
 0  0e                                               push       cs
        ID: 609 (push)
        Prefix:0x00 0x00 0x00 0x00 
        Opcode:0x0e 0x00 0x00 0x00 
        rex: 0x0
        addr_size: 4
        modrm: 0x0
        disp: 0x0
        sib: 0x0
        op_count: 1
                operands[0].type: REG = cs
                operands[0].size: 2
        Groups: not64bitmode 

Here capstone tells us that there is one operand, the cs register, which is implicit here.

csarn avatar Sep 19 '24 17:09 csarn

@imbillow Sorry, for answering so late.

So mov.a is defined here: https://github.com/TriDis/llvm-tricore/blob/4bfc5ee073becdcf799978431a9c03045b1091a2/lib/Target/TriCore/TriCoreInstrInfo.td#L547

If all mov instructions implicitly set the d15 register, they should be added there like this (also check the parent or child classes):

let Defs = [D15], Uses = [] in {
  class MOV_RR<bits<8> op1, bits<8> op2, string opstr,
  RegisterClass outregClass, RegisterClass inregClass>
...
}

If you generate the tables again, you get implicitly defined regs. Also you can open a PR with those changes in the TriDis repo.

The generated tables changed a little. You maybe have to apply only relevant changes. E.g. like this:

git diff -U0 | grepdiff -E '<PATTERN>' --output-matching=hunk | git apply --cached --unidiff-zero

Rot127 avatar Sep 20 '24 04:09 Rot127

@csarn I am not sure if you understand you correctly. So here the general operand classification for Auto-Sync archs (TriCore is one of them):

  • explicit operands: Any operand you see in the asm text should be in the details. Only if you choose to get the real operands for an alias instruction (via cs_tool -r or CS_OPT_DETAIL_REAL) the asm text and detail operands differ.
  • implicit operands: Anything effectively used by the instruction, but not shown in the alias or real asm text (as this D15 access from above). Those ones are in cs_detail->regs_read/write.

If there is an instruction which doesn't give these results, it is considered a bug.

Rot127 avatar Sep 20 '24 04:09 Rot127

@Rot127 Ok, then I mis-used the word "implicit". All the examples (tricore, and the x86 "push cs") have the relevant operands explicit in the asm text. I was thinking on opcode level, where the encoded instruction would have no bits indicating the operand, because for that operand there is a special opcode.

So you are confirming that this issue is actually a bug. Going back to the first example:

cstool -d tc162 9a00
 0  9a 00  add  d15, d0, #0
        ID: 31 (add)
        op_count: 2
                operands[0].type: REG = d0
                        .access: WRITE
                operands[1].type: IMM = 0x0
                        .access: READ
        Registers modified: d0

the disassembled asm string "add d15, d0, #0" is correct, and shows 3 operands. The details only show the last two of those, "d15" is missing.

csarn avatar Sep 20 '24 09:09 csarn

Yes, it is a bug. @imbillow Sorry, I didn't look at the code so far. Just recognized with the last comment what you meant. I would need to check. But if the Printer just prints the hard-coded string of the register name (e.g. "d15") you would need to add it in some fixup function.

I can take a look tomorrow and give you more details.

Rot127 avatar Sep 20 '24 18:09 Rot127

@imbillow Just checked it. Yeah this is this annoying problem of people hard coding operands in the mnemonic. Check out functions AArch64_insert_detail_op_reg_at() and how it is used. You can do something like this.

Rot127 avatar Sep 21 '24 06:09 Rot127

@imbillow I started an attempt for this one here: https://github.com/capstone-engine/capstone/pull/2502 There are problems though. While we can figure out the registers at position >0 (by checking the bits as in the PR), we cannot easily figure out the d15 and a15 regs at position 0. Because they are emitted with the mnemonic. So no bits indicate their presence.

Also the attempt in the PR is flawed, because it might add registers also to instructions which have randomly the correct bits set at these positions.

Now, I would propose to fix for these instructions the td files. I think we can add the d15/a15 register as implicit write. Then check in the fixup function, if d15/a15 is in the implicit write list AND in the asm string. If yes, we remove it from the list and add it as register at an index. We can determine the index by counting the ,.

Super ugly and resource intense because we check strings. For every instructions. But I cannot come up with another idea currently. Except we want to go deep into LLVM logic, but I have not time for this unfortunately.

Another problem is the missing register access information. We maybe have to do a string search in the mnemonic again?

Or is there a way to check the instruction encoding? Do you know this?

Rot127 avatar Oct 06 '24 14:10 Rot127

I just saw you've solved the problem with the RzIL uplifting. So we could just copy the distinction from there.

Rot127 avatar Oct 06 '24 14:10 Rot127

But TriCore's RzIL code is also pretty much re-disassembled.

I think it might be more elegant to edit tricore's TableGen, if feasible.

b1llow avatar Oct 16 '24 14:10 b1llow

However, these implicit registers may also be present in an memory operand.

image

In this case the metadata is completely wrong.

cstool -d tc162 c800
 0  c8 00  ld.a a0, [a15]#0
        ID: 165 (ld.a)
        op_count: 1
                operands[0].type: MEM
                        .mem.base: REG = a0
                        .mem.disp: 0x0
                        .access: WRITE
        Registers read: a0
        Registers modified: a0
        Groups: (null) 

b1llow avatar Oct 16 '24 15:10 b1llow

Yeah, I figured this out later as well. Guess we really need to fix it in the td files.

Rot127 avatar Oct 17 '24 14:10 Rot127