ghidra
ghidra copied to clipboard
RISC-V: Add WCH/QingKe XW extension
This PR adds support for the WCH/QingKe "XW" extension, closing issue #6385
This only includes support for the "XW" extension and not any of the other QingKe-specific functionality (i.e. the ARM Cortex-inspired SysTick and interrupt controller)
As part of this, I had to make a duplicate of the existing .cspec file, as some issues seem to arise with it if the core supports only single-precision floating point but not double-precision floating point. I don't know whether this .cspec is actually correct or not, especially for the QingKe-V2 core which only implements RV32ECXW.
I don't know how to make Ghidra correctly pick the correct variant based on .riscv.attributes
.
The XW opcode disassembly was tested by feeding all possible opcodes (as far as I can tell) through the vendor toolchain, opening the resulting .o into Ghidra, and then visually scanning the results.
One thing sticking out are the insn names conflicting with ratified Zcb instructions.
Naming them like xw.c.lbu
has a higher chance of getting upstreamed to compilers (and reduce overall mess), though won't compile back through their proprietary gcc.
To fix build error:
diff --git a/Ghidra/Processors/RISCV/certification.manifest b/Ghidra/Processors/RISCV/certification.manifest
index 07cddb0454..9e40a0da46 100644
--- a/Ghidra/Processors/RISCV/certification.manifest
+++ b/Ghidra/Processors/RISCV/certification.manifest
@@ -5,6 +5,7 @@ data/languages/RV32G.pspec||GHIDRA||||END|
data/languages/RV32GC.pspec||GHIDRA||||END|
data/languages/RV32I.pspec||GHIDRA||||END|
data/languages/RV32IC.pspec||GHIDRA||||END|
+data/languages/RV32IMACFX.pspec||GHIDRA||||END|
data/languages/RV32IMC.pspec||GHIDRA||||END|
data/languages/RV64G.pspec||GHIDRA||||END|
data/languages/RV64GC.pspec||GHIDRA||||END|
@@ -16,6 +17,7 @@ data/languages/riscv.ilp32d.slaspec||GHIDRA||||END|
data/languages/riscv.ilp32d_thead.slaspec||GHIDRA||||END|
data/languages/riscv.instr.sinc||GHIDRA||||END|
data/languages/riscv.ldefs||GHIDRA||||END|
+data/languages/riscv.lp32qingke.slaspec||GHIDRA||||END|
data/languages/riscv.lp64d.slaspec||GHIDRA||||END|
data/languages/riscv.lp64d_thead.slaspec||GHIDRA||||END|
data/languages/riscv.opinion||GHIDRA||||END|
@@ -30,6 +32,7 @@ data/languages/riscv.rv32k.sinc||GHIDRA||||END|
data/languages/riscv.rv32m.sinc||GHIDRA||||END|
data/languages/riscv.rv32p.sinc||GHIDRA||||END|
data/languages/riscv.rv32q.sinc||GHIDRA||||END|
+data/languages/riscv.rv32xw.sinc||GHIDRA||||END|
data/languages/riscv.rv64a.sinc||GHIDRA||||END|
data/languages/riscv.rv64b.sinc||GHIDRA||||END|
data/languages/riscv.rv64d.sinc||GHIDRA||||END|
@@ -56,6 +59,7 @@ data/languages/riscv.zvbc.sinc||GHIDRA||||END|
data/languages/riscv.zvkng.sinc||GHIDRA||||END|
data/languages/riscv.zvksg.sinc||GHIDRA||||END|
data/languages/riscv32-fp.cspec||GHIDRA||||END|
+data/languages/riscv32-fp32.cspec||GHIDRA||||END|
data/languages/riscv32.cspec||GHIDRA||||END|
data/languages/riscv32.dwarf||GHIDRA||||END|
data/languages/riscv64-fp.cspec||GHIDRA||||END|
If you can provide a binary exemplar or two including the new instructions, I'd like to add it to my RISCV collection. Metadata describing this RSA extension would be nice too, for instance:
- what version of gcc or llvm was used as the pre-patched base for compiling the source code?
- has the vendor taken steps to upstream support for this version of the extension in binutils, gcc, llvm, or the linux kernel?
- does this vendor extension have a standard multi-letter name (like
_xtheadbb
), or just the older single-letter name? - has the vendor released other versions of this ISA extension, or subsequently announced similar chip families using the frozen Zcb extension?
Mostly I'm looking for hints on whether the vendor considers this extension tracked for standardization, deprecated in favor of newer extensions, or proprietary.
re https://github.com/NationalSecurityAgency/ghidra/pull/6390#issuecomment-2043412222: to be honest I have no idea what this actually does (running sleigh manually works without doing this), but I've added it as a separate commit
re https://github.com/NationalSecurityAgency/ghidra/pull/6390#issuecomment-2043952241:
i am attaching the files that I used to visually smoke-test this PR (as well as the quick hack script that generated them): wch-xw.zip
the opcodes are definitely found in the bootroms of the ch32v003/ch32v203/ch32v208 chips, but i do not want to be responsible for publicly distributing those. i've also been able to make vendor gcc emit them for ad-hoc specially-contrived test functions. the disassembly makes sense when skimming these cases.
as to your questions:
- the vendor gcc's version command reports
riscv-none-elf-gcc (xPack GNU RISC-V Embedded GCC x86_64) 12.2.0
- no idea, i am not affiliated with the vendor, and I haven't found anything from google searching the english-language internet
- all vendor documentation (including a quick skim of the chinese-language documentation) seems to just call it XW
- i am only aware of a single version of this extension (specifying rv32ecxw (for QingKe V2) vs rv32ima{f}cxw (for QingKe V4) doesn't change the opcodes). I don't know anything about future vendor cores.
Thanks for the exemplars and the contextual metadata. I'll see about getting them into my exemplars repo.
The certification.manifest
file @jobermayr added is needed for building and distributing Ghidra, not for using sleigh to locally support a new processor spec or instruction set extension. It may amount to a public assertion that the contents of your new file are free of other proprietary or licensing claims and can be freely redistributed within Ghidra's Apache license.
If anyone knows how to triage intellectual property rights and licensing from reverse-engineered binaries we could have a fun discussion here. I've only been subpoenaed for a deposition on that topic once, which was enough for me.
If anyone knows how to triage intellectual property rights and licensing from reverse-engineered binaries we could have a fun discussion here.
GCC is GPL licensed
@jnk0le: gcc and binutils are both safe. The thead, ventana, and sifive vendor extensions contributed to binutils are probably equally safe. I'm unsure about the others.
As someone who has contributed to other projects that involve reverse engineering, I can assure you that the reverse-engineering processes that was done in order to create this specific PR is, to the best of my knowledge (IANAL), completely safe. As you can see from the script included in the zip file posted earlier, I performed the entire process by running controlled inputs through the vendor toolchain and observing the bytes that come out. In no case did I ever load vendor tools themselves into Ghidra or any other disassembler. The bit patterns and pcode in this PR describe only facts and information (and are my own expression of such, not the vendor's), so I do not believe the vendor can make any copyright claims to it. I am less familiar with patent law and did not read any patents while doing this work, but, as this PR doesn't involve implementing a RISC-V core itself, I find it unlikely that there is a legitimate patent claim to... decoding some opcodes for a disassembler.
In any case, I believe this discussion is far off-topic and out of scope when it comes to discussing this PR in question.
Does anyone have suggestions on how to name this extension and the processor/language variant within the Ghidra RISCV directory? The wch toolchain provided by @ArcaneNibble names it xw2p2
. That implies version 2.2 of the w
vendor-specific (x
) extension set. The pull request calls the processor/language variant RV32_QingKe
- but not all QingKe processors support this versioned extension.
The Ghidra design questions here are well over my head. They get worse when Ghidra opens a linux-next RISCV-64 kernel, which determines the supported extensions at load time then patches its own executable code to optimize things like bit manipulation and encrypted file system support.
There are build errors together with #5778 (conflicting changes):
Compiling ./data/languages/riscv.lp32qingke.slaspec:
line 1:9 no viable alternative at input '<EOF>'
3 NOP constructors found
Use -n switch to list each individually
riscv.table.sinc:149: Size restriction error in table 'instruction' in constructor at riscv.zfh.sinc:232
Problem with 'frd' and 'frs1H' in 'Copy(=)' operator
Input and output sizes must match; {type=handle value_real=0x0 spaceid=null} != {type=handle value_real=0x0 spaceid=null}
riscv.table.sinc:149: Size restriction error in table 'instruction' in constructor at riscv.zfh.sinc:236
Problem with 'frd' and 'frs1H' in 'Copy(=)' operator
Input and output sizes must match; {type=handle value_real=0x0 spaceid=null} != {type=handle value_real=0x0 spaceid=null}
riscv.table.sinc:149: Size restriction error in table 'instruction' in constructor at riscv.zfh.sinc:240
Problem with 'frd' and 'frs1H' in 'Copy(=)' operator
Input and output sizes must match; {type=handle value_real=0x0 spaceid=null} != {type=handle value_real=0x0 spaceid=null}
riscv.table.sinc:149: Size restriction error in table 'instruction' in constructor at riscv.zfh.sinc:244
Problem with 'frd' and 'frs1H' in 'Copy(=)' operator
Input and output sizes must match; {type=handle value_real=0x0 spaceid=null} != {type=handle value_real=0x0 spaceid=null}
riscv.table.sinc:149: Size restriction error in table 'instruction' in constructor at riscv.zfh.sinc:252
Problem with 'frd' and 'frs1H' in 'Copy(=)' operator
Input and output sizes must match; {type=handle value_real=0x0 spaceid=null} != {type=handle value_real=0x0 spaceid=null}
No output produced
Is there an easy way to fix it?
A quick fix is to force the lengths to match with this patch. Floating point width conversion doesn't really work this way, so the Ghidra emulator will fail to give anything useful. The decompiler window should be OK, if you pretend there is some C compiler that can implicitly convert 16 bit floating point values to 32 bit floating point values.
$ git diff riscv.table.sinc
diff --git a/Ghidra/Processors/RISCV/data/languages/riscv.table.sinc b/Ghidra/Processors/RISCV/data/languages/riscv.table.sinc
index 05def9e30b..2e93bf8ada 100644
--- a/Ghidra/Processors/RISCV/data/languages/riscv.table.sinc
+++ b/Ghidra/Processors/RISCV/data/languages/riscv.table.sinc
@@ -94,7 +94,7 @@ frs3D: fr2731 is fr2731 { local tmp = fr2731:$(DFLEN); export tmp; }
macro fassignS(dest, src) {
@if FPSIZE == "32"
- dest = src;
+ dest = zext(src);
@else
dest = zext(src);
@endif
That's not the correct way to resize floats. You should use float2float
float2float isn't defined when converting the 16 bit floats supported in riscv.zfh.sinc. I can use it anyway, if that's more consistent with your roadmap.
On Wed, Jun 12, 2024 at 2:25 PM GhidorahRex @.***> wrote:
That's not the correct way to resize floats. You should use float2float
— Reply to this email directly, view it on GitHub https://github.com/NationalSecurityAgency/ghidra/pull/6390#issuecomment-2163657735, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXFNMDMJBEOHKJ2LLEAXG23ZHCHB3AVCNFSM6AAAAABF3QOAW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTGY2TONZTGU . You are receiving this because you commented.Message ID: @.***>