ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

Tricore improvements

Open mumbel opened this issue 5 years ago • 15 comments

  • Tricore ELF relocations
  • paterns

mumbel avatar Jan 19 '20 00:01 mumbel

Where should I log these issues we have with the tricore decode?

https://github.com/mumbel/ghidra/issues/21#issuecomment-586782506

DarrylC03 avatar Feb 17 '20 19:02 DarrylC03

@mumbel I'm reviewing the PR and need a more meaty example for the tricore. This is an oldie, but looks worthwhile and unfortunately slipped down in the queue, my apologies. While we were looking at other tricore issues in aggregate I found this one.

Do you know of a good binaries in the public internet?

I've been looking at the function start patterns, and I'm concerned they may be too loose. The start patterns appear like they could start in the middle of a function and are general instructions. It could be OK to use them, but they may need some more constraints and to be separated from the base patterns. I can do that but need a bigger/more samples than what I have. I can use .o lib files, the pcode unit test, or I can compile some tests, but I'd rather real world examples in the public space if possible.

I've updated the relocation handler to match master.

emteere avatar Feb 17 '23 16:02 emteere

@emteere Oh. Yeah I should have listed a few examples I worked off of in a comment/MR when I made this.

I can look around on github this weekend for some examples with relocations. I've changed setups since then, so maybe have lost the originals I worked from.

edit: I'll double check the patterns, if there was some reasoning why I thought the middle of function wouldn't conflict or not.

edit2: Now you need a pcode_test that instead of avoiding relocations , inserts as many as possible somehow

mumbel avatar Feb 17 '23 17:02 mumbel

One way to get as many relocation types as possible is to use .o files. Most of the time there are only a handful of relocations that would exist after something is linked so you don't need to support every relocation. Even in a .o where they show up, they might not be needed, unless you need to fake out the bytes so that the reference is made to the right external unknown symbol. There are many reasons to load all the .o files only as individual programs as well, so we have been trying to support the relocations that are found in .o's that might matter.

Just compiling a bunch of test .o files that have references to external things in various ways would probably do it. I've used IDE .lib/.o files for this before just to verify the relocation. It isn't an automated test.

We've thought of running the PCodeUnit tests on just the .o files for many reasons, relocations being one. Hard to get some toolchains to link is another. If we could just create a directory full of .o files that were snippets of tests, that would be good. That is done to some extent.

gcc and llvm has a good set of relocations tests that are created with assembly files. Not sure they would be good tests for getting the right answer during a relocation.

emteere avatar Feb 17 '23 17:02 emteere

@emteere I'll keep looking (I think a linux kernel might have been one of my samples) but https://github.com/hamma96/ERIKA-OS-on-TriCore has a few .o and .a (looks like the elf is no relocatable)

mumbel avatar Feb 17 '23 18:02 mumbel

For what its worth, I've used this PR for since its release, mostly because of the added patterns. It still leaves a lot to be desired on my actual use case (automotive engine control units).

I can provide some Intel/Motorolla Hex files of my use case binaries if it would assist. I've tried using the pattern generator tool before on something I've actually went through and manually created functions on before by hand (painstakingly I might add!) but I lack the knack for picking good patterns to cause it to work well.

I think the majority of your end users dont rely on the object-file based reloc and stuff since we'll be working on binaries extracted from automotive controllers and the like, as that's the largest use case of Tricore processors in the real world, if that helps.

smgoldade avatar Feb 18 '23 02:02 smgoldade

Real world samples are always good. I'd prefer public url's to them.

I've commented out a few patterns that aren't strong function starts. They could be put back in as potential function starts, or just disassemble, and several strong start patterns possibly don't need a prepattern, such as "mov.aa a14,a10; sub.a a10, const8".

The post patterns I commented out might be strong instructions, but they could occur anywhere in a function. The requirement to follow a return could be a strong pattern, but if a ret is embedded in the middle of a function that could be an issue.

Because of the totalbits and postbits, I don't believe most of the pre/post pattern combinations would never hit.

Here's the current file:

<patternlist>
  <patternpairs totalbits="24" postbits="10">
    <prepatterns>
      <data>00000000 10010000</data> <!-- ret -->>
      <data>11011100 00001011</data> <!-- ji a11 -->>
      <data>00101101 00001011 00000000 00110000</data> <!-- ji a11 -->>
      <data>10011101 ........ ........ ........</data> <!-- ja disp24 -->>
      <data>00011101 ........ ........ ........</data> <!-- j disp24 -->>
      <data>00111100 ........</data> <!-- j disp8 -->>
    </prepatterns>

    <postpatterns>
      <data>01000000 10101110</data> <!-- mov.aa  a14,a10 -->>
      <data>00100000 ......00</data> <!-- sub.a a10, const8 -->>
      <data>01000000 10101110 00100000 ......00</data> <!-- mov.aa  a14,a10; sub.a a10, const8 -->>

      <!--<data>00000101 ....1111 ........ ....01..</data> <!- ld.bu d15, off18 -->>
      <!--<data>00001100 ........</data> <!- ld.bu d15, [aN], off4 -->>
      <!--<data>00111011 ....0000 ........ 1111....</data> <!- mov d15, const16 -->>
      <!--<data>00111011 ....0000 ........ 0100....</data> <!- mov d4, const16 -->>
      <!--<data>00111011 ....0000 ........ 1000....</data> <!- mov d8, const16 -->>
      <!--<data>10000010 ....1111</data> <!- mov d15, const4 -->>
      <!--<data>10000010 ....0100</data> <!- mov d4, const4 -->>
      <!--<data>10000010 ....1000</data> <!- mov d8, const4 -->>
      <!--<data>01111101 ....0000 ........ 1111....</data> <!- movh d15, const16 -->>
      <!--<data>10010001 ....0000 ........ 1111....</data> <!- movh.a a15, const16 -->>
      <!--<data>01111101 ....0000 ........ 0100....</data> <!- movh d4, const16 -->>
      <!--<data>01111101 ....0000 ........ 1000....</data> <!- movh d8, const16 -->>
      <!--<data>11011010 ........</data> <!- mov d15, const8 -->>
      <!--<data>00011101 ........ ........ ........</data> <!- j disp24  (thunk detection) -->>
      <!--<data>10000101 ....1111 ........ ....00..</data> <!- ld.w d15, off18 -->>
      <funcstart/>
    </postpatterns>
  </patternpairs>
</patternlist>

emteere avatar Feb 18 '23 05:02 emteere

Here's a TC-1767 in Intel Hex, taken from a Ford vehicle's engine controller: https://www.dropbox.com/s/q4ytpmbkzj9lhvf/GMBM2A2.hex?dl=0

You can divide it into sections as follows: 0x80000000: CODE R/X -> This contains the jump to the entry point. 0x80004000: PBL R/X -> This is an internal bootloading application with entry point at 0x800043E0 0x80014000: PROT R/X -> This is a section that contains some internal security functions, it has no entry point but may be called from other sections
0x80018000: BOOT R/X -> This is the boot section, and is jumped to by the first function at 0x80000000. Entry is 0x80018078 0x80020000: DATA R -> This is a data section. Contains calibration data referenced by the application software 0x80080000: ASW R/X -> This is the primary application software. Runs the vehicle's engine. Entry point is 0x800A86AE

I think this serves as a great reference considering Tricore is primarily used by automotive controllers, and this file is self-contained (contains all of the data in the flash), and represents the full "operating system" package that would be had. BOOT contains the startup low level code, which determines if the PBL or the ASW section will be executed based on given conditions.

smgoldade avatar Feb 18 '23 06:02 smgoldade

Can you explain the memory segments in the Tricore?

The upper 4 bits map to a segment. Can these segments be mapped to the same location? For example 0xa0000000 and 0x80000000 appear to be the same location.

Is segment mapping something that should be added to the addressing, especially for branching so that a set of segment register can map memory to the correct bytes? They could be defaulted to non-initialized, or to the same location 0x80000000 at startup, and if one wanted to change that for a particular example they could.

emteere avatar Feb 22 '23 01:02 emteere

So for the TC-1767, the layout is 0x8000 0000 - 0x801F FFFF -> Program Flash (PFLASH). 0x8FE0 0000 - 0x8FE0 7FFF -> Data Flash (DFLASH) 0. DFLASH is the "EEPROM" on a Tricore, but not quite. 0x8FE1 0000 - 0x8FE1 7FFF -> DFLASH 1 0xD000 0000 - 0xD001 1FFF -> Local Data RAM (LDRAM) 0xD400 0000 - 0xD400 5FFF -> Scratch-Pad RAM (SPRAM) -> This RAM is typically executable and also cannot be written to at a byte level, only 2,4, and 8 byte writes work. 0xF000 0000 - 0xFFFF FFFF -> Serial Peripheral Bus (SPB) memory map

Segment 10 (0xA) accesses segment 8 (0x8) uncached. When the CPU accesses segment 8 normally, it uses an internal cache. If it accesses it via segment 10, it is uncached.

All the Tricore processors have this same style of access. I've never known how to properly represent this in Ghidra, so I usually just set up 0xA to be an overlay of 0x8 and sometimes have to deal with weird disasm and the like.

Even amongst the users of Tricore, sometimes vehicle OEMs deal entirely with addresses using segment 10, or entirely using segment 8. I think its one of the lesser understood features of the processor.

smgoldade avatar Feb 22 '23 18:02 smgoldade

Appears that segment 0xD and 0xC have the same cache/non-cache tag on memory addresses. Is that true?

emteere avatar Feb 27 '23 21:02 emteere

https://community.infineon.com/gfawx74859/attachments/gfawx74859/AURIX/5399/1/Infineon-TC21x-TC22x-TC23x-UM-v01_01-EN.pdf

@emteere , section 6.7 might be what you're looking for

mumbel avatar Feb 28 '23 01:02 mumbel

I think what I was seeing with 0xC/0xD segments was address calculation because of incorrect a0,a1,a8,a9, but it has been a few weeks since I last looked at it.

I'm prototyping tracking those global addresses and noting where they are.

I'm seeing multiple areas where they are set to different values. I can easily track them.

The basic issue with applying these values automatically is knowing you have found them all, and then applying them in a second pass. Trying to come up with a good compromise, as this happens in the MIPS as well with GP, and isn't handled well there if there are multiple values, although it does attempt to track and apply the GP.

emteere avatar Mar 13 '23 13:03 emteere

@mumbel I've tried to compile this patched on current ghidra HEAD and it fails with following errors: `

Task :tricore:compileJava FAILED /Users/CUT/src/ghidra/Ghidra/Processors/tricore/src/main/java/ghidra/app/util/bin/format/elf/relocation/TRICOREElfRelocationHandler.java:28: error: TRICOREElfRelocationHandler is not abstract and does not override abstract method relocate(ElfRelocationContext,ElfRelocation,Address) in ElfRelocationHandler public class TRICOREElfRelocationHandler extends ElfRelocationHandler { ^ /Users/CUT/src/ghidra/Ghidra/Processors/tricore/src/main/java/ghidra/app/util/bin/format/elf/relocation/TRICOREElfRelocationHandler.java:36: error: relocate(ElfRelocationContext,ElfRelocation,Address) in TRICOREElfRelocationHandler cannot override relocate(ElfRelocationContext,ElfRelocation,Address) in ElfRelocationHandler public void relocate(ElfRelocationContext elfRelocationContext, ElfRelocation relocation, Address relocationAddress) ^ return type void is not compatible with RelocationResult /Users/CUT/src/ghidra/Ghidra/Processors/tricore/src/main/java/ghidra/app/util/bin/format/elf/relocation/TRICOREElfRelocationHandler.java:35: error: method does not override or implement a method from a supertype @Override ^ 3 errors

`

nuschpl avatar Dec 28 '23 16:12 nuschpl

I've solved above in below commit, not sure how to do pull request to pull request: https://github.com/nuschpl/ghidra/commit/15ed6c915007bbd56062392c2041988d514add7a

This adds return type int to all your relocation functions instead of void. This int holds relocation size in bytes and is then pased to RelocationResult in same way as other relocators alredy existing Ghidra do.

nuschpl avatar Dec 28 '23 17:12 nuschpl