ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

RISC-V: Add WCH/QingKe XW extension

Open ArcaneNibble opened this issue 10 months ago • 13 comments

This PR adds support for the WCH/QingKe "XW" extension, closing issue #6385

This only includes support for the "XW" extension and not any of the other QingKe-specific functionality (i.e. the ARM Cortex-inspired SysTick and interrupt controller)

As part of this, I had to make a duplicate of the existing .cspec file, as some issues seem to arise with it if the core supports only single-precision floating point but not double-precision floating point. I don't know whether this .cspec is actually correct or not, especially for the QingKe-V2 core which only implements RV32ECXW.

I don't know how to make Ghidra correctly pick the correct variant based on .riscv.attributes.

The XW opcode disassembly was tested by feeding all possible opcodes (as far as I can tell) through the vendor toolchain, opening the resulting .o into Ghidra, and then visually scanning the results.

ArcaneNibble avatar Apr 07 '24 18:04 ArcaneNibble

One thing sticking out are the insn names conflicting with ratified Zcb instructions.

Naming them like xw.c.lbu has a higher chance of getting upstreamed to compilers (and reduce overall mess), though won't compile back through their proprietary gcc.

jnk0le avatar Apr 07 '24 22:04 jnk0le

To fix build error:

diff --git a/Ghidra/Processors/RISCV/certification.manifest b/Ghidra/Processors/RISCV/certification.manifest
index 07cddb0454..9e40a0da46 100644
--- a/Ghidra/Processors/RISCV/certification.manifest
+++ b/Ghidra/Processors/RISCV/certification.manifest
@@ -5,6 +5,7 @@ data/languages/RV32G.pspec||GHIDRA||||END|
 data/languages/RV32GC.pspec||GHIDRA||||END|
 data/languages/RV32I.pspec||GHIDRA||||END|
 data/languages/RV32IC.pspec||GHIDRA||||END|
+data/languages/RV32IMACFX.pspec||GHIDRA||||END|
 data/languages/RV32IMC.pspec||GHIDRA||||END|
 data/languages/RV64G.pspec||GHIDRA||||END|
 data/languages/RV64GC.pspec||GHIDRA||||END|
@@ -16,6 +17,7 @@ data/languages/riscv.ilp32d.slaspec||GHIDRA||||END|
 data/languages/riscv.ilp32d_thead.slaspec||GHIDRA||||END|
 data/languages/riscv.instr.sinc||GHIDRA||||END|
 data/languages/riscv.ldefs||GHIDRA||||END|
+data/languages/riscv.lp32qingke.slaspec||GHIDRA||||END|
 data/languages/riscv.lp64d.slaspec||GHIDRA||||END|
 data/languages/riscv.lp64d_thead.slaspec||GHIDRA||||END|
 data/languages/riscv.opinion||GHIDRA||||END|
@@ -30,6 +32,7 @@ data/languages/riscv.rv32k.sinc||GHIDRA||||END|
 data/languages/riscv.rv32m.sinc||GHIDRA||||END|
 data/languages/riscv.rv32p.sinc||GHIDRA||||END|
 data/languages/riscv.rv32q.sinc||GHIDRA||||END|
+data/languages/riscv.rv32xw.sinc||GHIDRA||||END|
 data/languages/riscv.rv64a.sinc||GHIDRA||||END|
 data/languages/riscv.rv64b.sinc||GHIDRA||||END|
 data/languages/riscv.rv64d.sinc||GHIDRA||||END|
@@ -56,6 +59,7 @@ data/languages/riscv.zvbc.sinc||GHIDRA||||END|
 data/languages/riscv.zvkng.sinc||GHIDRA||||END|
 data/languages/riscv.zvksg.sinc||GHIDRA||||END|
 data/languages/riscv32-fp.cspec||GHIDRA||||END|
+data/languages/riscv32-fp32.cspec||GHIDRA||||END|
 data/languages/riscv32.cspec||GHIDRA||||END|
 data/languages/riscv32.dwarf||GHIDRA||||END|
 data/languages/riscv64-fp.cspec||GHIDRA||||END|

jobermayr avatar Apr 08 '24 18:04 jobermayr

If you can provide a binary exemplar or two including the new instructions, I'd like to add it to my RISCV collection. Metadata describing this RSA extension would be nice too, for instance:

  • what version of gcc or llvm was used as the pre-patched base for compiling the source code?
  • has the vendor taken steps to upstream support for this version of the extension in binutils, gcc, llvm, or the linux kernel?
  • does this vendor extension have a standard multi-letter name (like _xtheadbb), or just the older single-letter name?
  • has the vendor released other versions of this ISA extension, or subsequently announced similar chip families using the frozen Zcb extension?

Mostly I'm looking for hints on whether the vendor considers this extension tracked for standardization, deprecated in favor of newer extensions, or proprietary.

thixotropist avatar Apr 09 '24 01:04 thixotropist

re https://github.com/NationalSecurityAgency/ghidra/pull/6390#issuecomment-2043412222: to be honest I have no idea what this actually does (running sleigh manually works without doing this), but I've added it as a separate commit

re https://github.com/NationalSecurityAgency/ghidra/pull/6390#issuecomment-2043952241:

i am attaching the files that I used to visually smoke-test this PR (as well as the quick hack script that generated them): wch-xw.zip

the opcodes are definitely found in the bootroms of the ch32v003/ch32v203/ch32v208 chips, but i do not want to be responsible for publicly distributing those. i've also been able to make vendor gcc emit them for ad-hoc specially-contrived test functions. the disassembly makes sense when skimming these cases.

as to your questions:

  • the vendor gcc's version command reports riscv-none-elf-gcc (xPack GNU RISC-V Embedded GCC x86_64) 12.2.0
  • no idea, i am not affiliated with the vendor, and I haven't found anything from google searching the english-language internet
  • all vendor documentation (including a quick skim of the chinese-language documentation) seems to just call it XW
  • i am only aware of a single version of this extension (specifying rv32ecxw (for QingKe V2) vs rv32ima{f}cxw (for QingKe V4) doesn't change the opcodes). I don't know anything about future vendor cores.

ArcaneNibble avatar Apr 09 '24 03:04 ArcaneNibble

Thanks for the exemplars and the contextual metadata. I'll see about getting them into my exemplars repo.

The certification.manifest file @jobermayr added is needed for building and distributing Ghidra, not for using sleigh to locally support a new processor spec or instruction set extension. It may amount to a public assertion that the contents of your new file are free of other proprietary or licensing claims and can be freely redistributed within Ghidra's Apache license.

If anyone knows how to triage intellectual property rights and licensing from reverse-engineered binaries we could have a fun discussion here. I've only been subpoenaed for a deposition on that topic once, which was enough for me.

thixotropist avatar Apr 09 '24 15:04 thixotropist

If anyone knows how to triage intellectual property rights and licensing from reverse-engineered binaries we could have a fun discussion here.

GCC is GPL licensed

jnk0le avatar Apr 09 '24 17:04 jnk0le

@jnk0le: gcc and binutils are both safe. The thead, ventana, and sifive vendor extensions contributed to binutils are probably equally safe. I'm unsure about the others.

thixotropist avatar Apr 09 '24 17:04 thixotropist

As someone who has contributed to other projects that involve reverse engineering, I can assure you that the reverse-engineering processes that was done in order to create this specific PR is, to the best of my knowledge (IANAL), completely safe. As you can see from the script included in the zip file posted earlier, I performed the entire process by running controlled inputs through the vendor toolchain and observing the bytes that come out. In no case did I ever load vendor tools themselves into Ghidra or any other disassembler. The bit patterns and pcode in this PR describe only facts and information (and are my own expression of such, not the vendor's), so I do not believe the vendor can make any copyright claims to it. I am less familiar with patent law and did not read any patents while doing this work, but, as this PR doesn't involve implementing a RISC-V core itself, I find it unlikely that there is a legitimate patent claim to... decoding some opcodes for a disassembler.

In any case, I believe this discussion is far off-topic and out of scope when it comes to discussing this PR in question.

ArcaneNibble avatar Apr 09 '24 18:04 ArcaneNibble

Does anyone have suggestions on how to name this extension and the processor/language variant within the Ghidra RISCV directory? The wch toolchain provided by @ArcaneNibble names it xw2p2. That implies version 2.2 of the w vendor-specific (x) extension set. The pull request calls the processor/language variant RV32_QingKe - but not all QingKe processors support this versioned extension.

The Ghidra design questions here are well over my head. They get worse when Ghidra opens a linux-next RISCV-64 kernel, which determines the supported extensions at load time then patches its own executable code to optimize things like bit manipulation and encrypted file system support.

thixotropist avatar Apr 10 '24 15:04 thixotropist

There are build errors together with #5778 (conflicting changes):

Compiling ./data/languages/riscv.lp32qingke.slaspec:
line 1:9 no viable alternative at input '<EOF>'
3 NOP constructors found
Use -n switch to list each individually
riscv.table.sinc:149: Size restriction error in table 'instruction' in constructor at riscv.zfh.sinc:232
  Problem with 'frd' and 'frs1H' in 'Copy(=)' operator
  Input and output sizes must match; {type=handle value_real=0x0 spaceid=null} != {type=handle value_real=0x0 spaceid=null}
riscv.table.sinc:149: Size restriction error in table 'instruction' in constructor at riscv.zfh.sinc:236
  Problem with 'frd' and 'frs1H' in 'Copy(=)' operator
  Input and output sizes must match; {type=handle value_real=0x0 spaceid=null} != {type=handle value_real=0x0 spaceid=null}
riscv.table.sinc:149: Size restriction error in table 'instruction' in constructor at riscv.zfh.sinc:240
  Problem with 'frd' and 'frs1H' in 'Copy(=)' operator
  Input and output sizes must match; {type=handle value_real=0x0 spaceid=null} != {type=handle value_real=0x0 spaceid=null}
riscv.table.sinc:149: Size restriction error in table 'instruction' in constructor at riscv.zfh.sinc:244
  Problem with 'frd' and 'frs1H' in 'Copy(=)' operator
  Input and output sizes must match; {type=handle value_real=0x0 spaceid=null} != {type=handle value_real=0x0 spaceid=null}
riscv.table.sinc:149: Size restriction error in table 'instruction' in constructor at riscv.zfh.sinc:252
  Problem with 'frd' and 'frs1H' in 'Copy(=)' operator
  Input and output sizes must match; {type=handle value_real=0x0 spaceid=null} != {type=handle value_real=0x0 spaceid=null}
No output produced

Is there an easy way to fix it?

jobermayr avatar Jun 11 '24 20:06 jobermayr

A quick fix is to force the lengths to match with this patch. Floating point width conversion doesn't really work this way, so the Ghidra emulator will fail to give anything useful. The decompiler window should be OK, if you pretend there is some C compiler that can implicitly convert 16 bit floating point values to 32 bit floating point values.

$ git diff riscv.table.sinc
diff --git a/Ghidra/Processors/RISCV/data/languages/riscv.table.sinc b/Ghidra/Processors/RISCV/data/languages/riscv.table.sinc
index 05def9e30b..2e93bf8ada 100644
--- a/Ghidra/Processors/RISCV/data/languages/riscv.table.sinc
+++ b/Ghidra/Processors/RISCV/data/languages/riscv.table.sinc
@@ -94,7 +94,7 @@ frs3D: fr2731 is fr2731 { local tmp = fr2731:$(DFLEN); export tmp; }
 
 macro fassignS(dest, src) {
 @if FPSIZE == "32"
-       dest = src;
+       dest = zext(src);
 @else
        dest = zext(src);
 @endif

thixotropist avatar Jun 12 '24 17:06 thixotropist

That's not the correct way to resize floats. You should use float2float

GhidorahRex avatar Jun 12 '24 18:06 GhidorahRex

float2float isn't defined when converting the 16 bit floats supported in riscv.zfh.sinc. I can use it anyway, if that's more consistent with your roadmap.

On Wed, Jun 12, 2024 at 2:25 PM GhidorahRex @.***> wrote:

That's not the correct way to resize floats. You should use float2float

— Reply to this email directly, view it on GitHub https://github.com/NationalSecurityAgency/ghidra/pull/6390#issuecomment-2163657735, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXFNMDMJBEOHKJ2LLEAXG23ZHCHB3AVCNFSM6AAAAABF3QOAW2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTGY2TONZTGU . You are receiving this because you commented.Message ID: @.***>

thixotropist avatar Jun 12 '24 18:06 thixotropist