ghidra
ghidra copied to clipboard
Getting the address of a varnode (aka instruction operand)
(rephrased to better match sleigh terminology)
I'm working on a processor description for VAX and would need to get the address of an instruction operand.
VAX has one-byte opcodes followed by operands with variable (1 to 5 bytes) length.
Examples (not exact mnemonics)
- one-byte opcode, two one-byte operands
00000000: 90 01 50 - MOVE.B S^1, R0
- one-byte opcode, one two-byte operand, one four-byte operand
00000000: 90 CF 34 12 E0 78 56 34 12 - MOVE.B (PC+0x1234), (R0 + 0x12345678)
Example 2 is the problem. The first operand ("CF 34 12") is PC-relative, it computes PC+0x1234, where PC is right after the final "12" value. In the example above, that would result in 0x1238.
Problem
To compute PC-relative offsets correctly, I need to know the operands memory address. However, neither inst_start
, nor inst_next
are usable here:
-
I can't use
inst_start
because the operand might be second and I don't know the size of the first operand. -
I can't use
inst_next
because the operand might be first and I don't know the size of the second operand.
Are there any other options ?
Wow! Implementing the VAX instruction set in Ghidra sounds like a very large task., certainly well beyond my level. But as a longtime VMS user and hobbyist, I'd sure love to see the end product.
(Likewise for the PDP-11, if anyone is interested in taking up that architecture.)
Wow! Implementing the VAX instruction set in Ghidra sounds like a very large task., certainly well beyond my level. But as a longtime VMS user and hobbyist, I'd sure love to see the end product.
😊
(Likewise for the PDP-11, if anyone is interested in taking up that architecture.)
It's on my list (now that I have some basic understanding of sleigh 😆 )
Since this is static within the instruction, I think context might work well here. Use something like op_addr = (0,3) noflow
For the opcode, use [ op_addr = 1; ]
hen for each operand, you know how big it is, so you can increment op_addr each time. [ rel_addr = inst_start+op_addr; op_addr = op_addr + bytes; ]
with bytes
being however many bytes are consumed by the currend operand.
Thanks, something similar was my initial attempt.
During parsing, the op_addr was computed correctly but it seems as if the final computation is done after the instruction is completely matched. So ever operand came out with the same (set by the last operand) op_addr.
However, I didn't specify noflow
- need to check this out.
Thanks again ! Will report here :wink:
You may need to add a globalset
as well? There's possibly some other shenanigans that may need to be employed, but I'm pretty confident this can be done with just context.
Were you able to get this to work?
Sorry, I was out last week.
For the opcode, use [ op_addr = 1; ]
Not clear where to use this, as the opcode
is a field (and I can't add disassembly actions to it, can I ? 🤔)
I tried setting the offset (aka op_addr
) for each instruction in a branch. Resetting the op_addr
for each instruction works that way, but makes disassembly like 10 times slower :-(
For the opcode, use [ op_addr = 1; ]
Not clear where to use this, as the
opcode
is a field (and I can't add disassembly actions to it, can I ? thinking)
Solved this with a non-visible operand
op_code: epsilon is epsilon [ op_addr = 1; ] { export epsilon; }
Works nicely, as the op_addr
value gets reset when I add ..; op_code; ..
to the bit pattern section.
However, when computing operands, every operand gets the final op_addr
value (after all operands are parsed) instead of the value at the respective operand position.
I now created a minified VAX processor description to visualize the problem better.
I use lifting-bits disassembler with this binary:
81 af 00 af 00 af 00
It should disassemble to
ADDB3 B^0x3, B^0x5, B^0x7
but doesn't.
Each operand disassembles to the same value. 😞
I've now tried all kind of combinations of context
, noflow
, globalset
etc. All give the same result: When export
ing the result, I get the final value (after all operand have been processed) and not the intermediate ones.
This doesn't come as a surprise to me since ghidra has to process all operands twice. Once for computing inst_next
and then again for computing the disassembled values (which might include inst_next
).
I've solved it now by introducing an operand_offset
variable.
(Adding _printf_s to Ghidra pointed me to the right places, esp. showing that ParserWalker's value retrieval functions where called twice - once reading 4-byte-value to match against the disassembler spec and once reading correctly-sized values to compute the correct disassembly values)
See https://github.com/NationalSecurityAgency/ghidra/commit/f9a87889c24cfb6f677493cfdbe2685e302fe2f5 for the C++
part and https://github.com/NationalSecurityAgency/ghidra/commit/ecc24c7c9e73ee4f448b277bdbe15898cfab5de4 for the Java
part.
operand_offset
is modeled like inst_start
but with a different getValue()
implementation:
Address addr = walker.getAddr();
return addr.getAddressableWordOffset();
return walker.getOffset(-1);
This works nicely and fixes the issue at hand.
Will #4812 be considered now ? 🥺
Sorry, I didn't realize this was tied to those PR's.
I found this issue because I searched for "VAX". I'm interested in this! However, I don't yet have any clue about Ghidra or Java. Is there any way to help with VAX support?
Just to add a comment: The VAX ISA does have one-byte and two-byte opcodes. So relying on them as one-byte long would be wrong. And then there's a ton of addressing modes. Plus the oddity of the CASE* instructions. Alas... I really would love to help here. It would be great to have something that helps to dissect machine ROMs or system binaries.
Hey @jbglaw , Ghidra VAX support is (mostly) done - except for #4812 😞 .
If you want to build from source, check out the vintage
branch at https://github.com/kkaempf/ghidra-vintage
I'm also maintaining RPM packages for iopenSUSE Tumbleweed
I found this issue because I searched for "VAX". I'm interested in this! However, I don't yet have any clue about Ghidra or Java. Is there any way to help with VAX support?
Please check out and contribute to https://github.com/kkaempf/ghidra-vax 😉
Just to add a comment: The VAX ISA does have one-byte and two-byte opcodes. So relying on them as one-byte long would be wrong. And then there's a ton of addressing modes. Plus the oddity of the CASE* instructions.
This all should be working in ghidra.vax
Alas... I really would love to help here. It would be great to have something that helps to dissect machine ROMs
I'm already working on ROMs and I'd be happy to collaborate on http://ghidra-server.org/
Well, I just requested an account on ghidra-server.org. Let's see.
OTOH, as I'm a 100% newbie to Ghidra, my first step should be to get it running. Source builds seem to be not too trivial with Debian as it's missing build tools (at least in the requested version.) And then there's that one outstanding patch. Are there chances those will be merged? At least it doesn't look as if it would break anything else.
Well, I just requested an account on ghidra-server.org. Let's see.
🤞🏻
OTOH, as I'm a 100% newbie to Ghidra, my first step should be to get it running. Source builds seem to be not too trivial with Debian as it's missing build tools (at least in the requested version.)
If you're not afraid of downloading binaries (like gradle) on your machine, building should be as simple as
gradle \
-Dfile.encoding=UTF-8 \
--project-prop finalRelease=true \
buildNatives_linux64
This will give you a .tar
file which you can extract locally and start ghidra
from there.
And then there's that one outstanding patch. Are there chances those will be merged?
Ghidra (the project) is generally slow in merging outside contributions :-/
At least it doesn't look as if it would break anything else.
Certainly not. It's just exposing a value that is already tracked internally.
gradle
is the issue here. But I think I'll give it a try in a Docker container. Maybe wrap a script around it to have a nice receipt for getting the final tarball.
So let's hope that this other PR is merged, and thereafter maybe the VAX CPU description. I'll try to get it working locally. :)
Successfully built Ghidra (plain upstream sources, though with buildGhidra
instead of buildNatives_linux64
. The resulting ZIP file contains a working Ghidra afterwards. Next step is to pull in your patch and the VAX CPU description.