ghidra-atom-microcode
ghidra-atom-microcode copied to clipboard
Ghidra Processor Module to disassemble and decompile the x86 Intel Atom microcode
Intel Atom Microcode - Ghidra Processor Module
Ghidra Processor Module to disassemble and decompile the x86 Intel Atom microcode provided in chip-red-pill/uCodeDisasm.
Install and Run
This module has been tested on Ghidra from 9.2 to 10.1.5.
-
Clone this repo in
<ghidra install dir>/Ghidra/Processors/ -
git clone https://github.com/chip-red-pill/uCodeDisasmand copylib/txt2ghidra.pyfrom this repo to theuCodeDisasmfolder. -
run
./txt2ghidra.py ../ucode/, that will produce aglm.ucodebinary file (or what specified by the-o <outfile.ucode>option).../ucode/must be the folder that contains the dumpsms_array0.txt, ms_array1.txt, ms_array0.txt -
Run Ghidra and load
glm.ucodeselectingx86ucodeas Language for the binary as shown in the screenshot:
Details
From chip-red-pill/uCodeDisasm:
The microcode of the Intel Atom CPUs consists from two large chunks of data – Microcode Triads and Sequence Words. These data are kept in the ROM area of a functional block inside CPU core that is called Microcode Sequencer (MS).
We encourage to read the uCodeDisasm readme and source code to understand ucode internals mechanisms.
Addressing
The txt2ghidra.py simply packs together Microcode Triads and Sequence words into 16-bytes ucode instructions that will be analyzed by our Ghidra Processor Module. During the process it additionally transforms and inserts metadata into the instructions to make ghidra's life easier.
Since instructions are now 16 bytes long and Ghidra does not currently support word sizes bigger than 8 bytes, the ucode address scheme in Ghidra is different from what one would expect from reading the uCodeDisasm repo, and all the code addresses must be multiplied by 0x10.
For example, cpuid_xlat is at address 0x0be0 in the originally published ucode, while at address 0x0be00 in Ghidra (see Screenshot).
Instructions
Each ghidra instruction will be composed by a microcode instruction and possibly by a sequence word, that either influences control flow after the instruction execution (eflow), or sets up some of synchronization before execution (or both).
Functions
chip-red-pill/uCodeDisasm already identified and named lot of functions inside the dumped code. All of these symbols are automatically loaded in Ghidra. The autoanalysis starts from these defined functions and performs recursive disassembly. To disassemble code not reached by the autoanalysis, press D. Then press F to define function starting from the address under the cursor, that will be analysed and decompiled.
ucode patches
The decompiler supports decrypted ucode patches. To generate an ucode blob with a patch applied pass -p <PATCH_FILE> to txt2ghidra.py.
The PATCH_FILE must be in the format:
0000: 000000000000 00003ecb3a3b 00003eb96ef7 00003e8c6217 -
[...] | match & patch
001c: 00003e9f31a5 00003efb758f 000000000000 000000000000 -
0000: c0053d03ffc8 4152f45c027f c21e0303d23d 000000000000 -
[...] | ucode RAM
01fc: 000000000000 000000000000 000000000000 000000000000 -
0000: 0000070000ce 000018201a50 000018201a50 0000384c0600 -
[...] | seqword RAM
007c: 000031805140 000011fc9192 000031f59040 000000000000 -
The patches will be shown in the decompilers as conditionals statements under the condition if (PATCH_ENABLED() != 0) { [...] } to show both the patch and the original instructions.
Open Problems
Most of the instructions' semantics is correctly defined, and decompilation should generally work.
There are a few remaining open problems to tackle. PR and issues to discuss them are welcomed.
- We identify function calls as instructions doing
saveuip + jmp(usually combining instructions and sequence words), but this may not always be true. - How do function calls take parameters and return values? Seems a mix of temporary registers, but not always the same registers.
- There is still unclear semantics on some operations (uflow uret parameters meaning, segment selectors packing, ...) marked by
TODOin the.slaspecfile
There is also some missing implementation details:
- All registers are assumed to be 64 bits, which is in general false. Disassembled instruction include the operand size, but not the decompiled view.
- No SSE/AVX instruction is currently supported.
- Temporary register aliasing is not modeled (
ROVR). - Indirect jumps are rarely resolved by ghidra.
- Functions return using jumps trough the
UIP0/1register (see (uCodeDisasm)[https://github.com/chip-red-pill/uCodeDisasm]) which decreases decompilation quality.