ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

Getting the address of a varnode (aka instruction operand)

Open kkaempf opened this issue 2 years ago • 21 comments

(rephrased to better match sleigh terminology)

I'm working on a processor description for VAX and would need to get the address of an instruction operand.

VAX has one-byte opcodes followed by operands with variable (1 to 5 bytes) length.

Examples (not exact mnemonics)

  1. one-byte opcode, two one-byte operands

00000000: 90 01 50 - MOVE.B S^1, R0

  1. one-byte opcode, one two-byte operand, one four-byte operand

00000000: 90 CF 34 12 E0 78 56 34 12 - MOVE.B (PC+0x1234), (R0 + 0x12345678)

Example 2 is the problem. The first operand ("CF 34 12") is PC-relative, it computes PC+0x1234, where PC is right after the final "12" value. In the example above, that would result in 0x1238.

Problem

To compute PC-relative offsets correctly, I need to know the operands memory address. However, neither inst_start, nor inst_next are usable here:

  • I can't use inst_start because the operand might be second and I don't know the size of the first operand.

  • I can't use inst_next because the operand might be first and I don't know the size of the second operand.

Are there any other options ?

kkaempf avatar Sep 19 '22 18:09 kkaempf

Wow! Implementing the VAX instruction set in Ghidra sounds like a very large task., certainly well beyond my level. But as a longtime VMS user and hobbyist, I'd sure love to see the end product.

(Likewise for the PDP-11, if anyone is interested in taking up that architecture.)

gtackett avatar Sep 23 '22 13:09 gtackett

Wow! Implementing the VAX instruction set in Ghidra sounds like a very large task., certainly well beyond my level. But as a longtime VMS user and hobbyist, I'd sure love to see the end product.

😊

(Likewise for the PDP-11, if anyone is interested in taking up that architecture.)

It's on my list (now that I have some basic understanding of sleigh 😆 )

kkaempf avatar Sep 23 '22 13:09 kkaempf

Since this is static within the instruction, I think context might work well here. Use something like op_addr = (0,3) noflow

For the opcode, use [ op_addr = 1; ]

hen for each operand, you know how big it is, so you can increment op_addr each time. [ rel_addr = inst_start+op_addr; op_addr = op_addr + bytes; ] with bytes being however many bytes are consumed by the currend operand.

GhidorahRex avatar Sep 23 '22 17:09 GhidorahRex

Thanks, something similar was my initial attempt.

During parsing, the op_addr was computed correctly but it seems as if the final computation is done after the instruction is completely matched. So ever operand came out with the same (set by the last operand) op_addr.

However, I didn't specify noflow - need to check this out.

Thanks again ! Will report here :wink:

kkaempf avatar Sep 23 '22 18:09 kkaempf

You may need to add a globalsetas well? There's possibly some other shenanigans that may need to be employed, but I'm pretty confident this can be done with just context.

GhidorahRex avatar Sep 23 '22 18:09 GhidorahRex

Were you able to get this to work?

GhidorahRex avatar Sep 29 '22 14:09 GhidorahRex

Sorry, I was out last week.

For the opcode, use [ op_addr = 1; ]

Not clear where to use this, as the opcode is a field (and I can't add disassembly actions to it, can I ? 🤔)

I tried setting the offset (aka op_addr) for each instruction in a branch. Resetting the op_addr for each instruction works that way, but makes disassembly like 10 times slower :-(

kkaempf avatar Oct 03 '22 06:10 kkaempf

For the opcode, use [ op_addr = 1; ]

Not clear where to use this, as the opcode is a field (and I can't add disassembly actions to it, can I ? thinking)

Solved this with a non-visible operand

op_code: epsilon is epsilon [ op_addr = 1; ] { export epsilon; }

Works nicely, as the op_addr value gets reset when I add ..; op_code; .. to the bit pattern section.

However, when computing operands, every operand gets the final op_addr value (after all operands are parsed) instead of the value at the respective operand position.

kkaempf avatar Oct 03 '22 12:10 kkaempf

I now created a minified VAX processor description to visualize the problem better.

I use lifting-bits disassembler with this binary:

81 af 00 af 00 af 00

It should disassemble to

ADDB3 B^0x3, B^0x5, B^0x7

but doesn't.

Each operand disassembles to the same value. 😞

kkaempf avatar Oct 21 '22 16:10 kkaempf

I've now tried all kind of combinations of context, noflow, globalset etc. All give the same result: When exporting the result, I get the final value (after all operand have been processed) and not the intermediate ones.

This doesn't come as a surprise to me since ghidra has to process all operands twice. Once for computing inst_next and then again for computing the disassembled values (which might include inst_next).

kkaempf avatar Oct 31 '22 11:10 kkaempf

I've solved it now by introducing an operand_offset variable.

(Adding _printf_s to Ghidra pointed me to the right places, esp. showing that ParserWalker's value retrieval functions where called twice - once reading 4-byte-value to match against the disassembler spec and once reading correctly-sized values to compute the correct disassembly values)

See https://github.com/NationalSecurityAgency/ghidra/commit/f9a87889c24cfb6f677493cfdbe2685e302fe2f5 for the C++ part and https://github.com/NationalSecurityAgency/ghidra/commit/ecc24c7c9e73ee4f448b277bdbe15898cfab5de4 for the Java part.

operand_offset is modeled like inst_start but with a different getValue() implementation:

inst_start has

Address addr = walker.getAddr();
return addr.getAddressableWordOffset();

operand_offset has

return walker.getOffset(-1);

This works nicely and fixes the issue at hand.

kkaempf avatar Oct 31 '22 11:10 kkaempf

Will #4812 be considered now ? 🥺

kkaempf avatar Mar 21 '23 16:03 kkaempf

Sorry, I didn't realize this was tied to those PR's.

ryanmkurtz avatar Mar 21 '23 16:03 ryanmkurtz

I found this issue because I searched for "VAX". I'm interested in this! However, I don't yet have any clue about Ghidra or Java. Is there any way to help with VAX support?

Just to add a comment: The VAX ISA does have one-byte and two-byte opcodes. So relying on them as one-byte long would be wrong. And then there's a ton of addressing modes. Plus the oddity of the CASE* instructions. Alas... I really would love to help here. It would be great to have something that helps to dissect machine ROMs or system binaries.

jbglaw avatar Jun 27 '23 18:06 jbglaw

Hey @jbglaw , Ghidra VAX support is (mostly) done - except for #4812 😞 .

If you want to build from source, check out the vintage branch at https://github.com/kkaempf/ghidra-vintage

I'm also maintaining RPM packages for iopenSUSE Tumbleweed

kkaempf avatar Jun 28 '23 08:06 kkaempf

I found this issue because I searched for "VAX". I'm interested in this! However, I don't yet have any clue about Ghidra or Java. Is there any way to help with VAX support?

Please check out and contribute to https://github.com/kkaempf/ghidra-vax 😉

Just to add a comment: The VAX ISA does have one-byte and two-byte opcodes. So relying on them as one-byte long would be wrong. And then there's a ton of addressing modes. Plus the oddity of the CASE* instructions.

This all should be working in ghidra.vax

Alas... I really would love to help here. It would be great to have something that helps to dissect machine ROMs

I'm already working on ROMs and I'd be happy to collaborate on http://ghidra-server.org/

kkaempf avatar Jun 28 '23 08:06 kkaempf

Well, I just requested an account on ghidra-server.org. Let's see.

OTOH, as I'm a 100% newbie to Ghidra, my first step should be to get it running. Source builds seem to be not too trivial with Debian as it's missing build tools (at least in the requested version.) And then there's that one outstanding patch. Are there chances those will be merged? At least it doesn't look as if it would break anything else.

jbglaw avatar Jun 28 '23 09:06 jbglaw

Well, I just requested an account on ghidra-server.org. Let's see.

🤞🏻

OTOH, as I'm a 100% newbie to Ghidra, my first step should be to get it running. Source builds seem to be not too trivial with Debian as it's missing build tools (at least in the requested version.)

If you're not afraid of downloading binaries (like gradle) on your machine, building should be as simple as

gradle \
  -Dfile.encoding=UTF-8 \
  --project-prop finalRelease=true \
  buildNatives_linux64

This will give you a .tar file which you can extract locally and start ghidra from there.

And then there's that one outstanding patch. Are there chances those will be merged?

Ghidra (the project) is generally slow in merging outside contributions :-/

At least it doesn't look as if it would break anything else.

Certainly not. It's just exposing a value that is already tracked internally.

kkaempf avatar Jun 28 '23 11:06 kkaempf

gradle is the issue here. But I think I'll give it a try in a Docker container. Maybe wrap a script around it to have a nice receipt for getting the final tarball.

jbglaw avatar Jun 28 '23 12:06 jbglaw

So let's hope that this other PR is merged, and thereafter maybe the VAX CPU description. I'll try to get it working locally. :)

jbglaw avatar Jun 28 '23 12:06 jbglaw

Successfully built Ghidra (plain upstream sources, though with buildGhidra instead of buildNatives_linux64. The resulting ZIP file contains a working Ghidra afterwards. Next step is to pull in your patch and the VAX CPU description.

jbglaw avatar Jun 29 '23 09:06 jbglaw