ghidra Emulator: use injected pcode for CALLOTHER

Is your feature request related to a problem? Please describe. It's related to trying to make ghidra work with the Andes "EX9.IT" instruction: https://github.com/NationalSecurityAgency/ghidra/discussions/6612

Describe the solution you'd like I noticed that the emulator essentially just interprets pcode. However, it does so in such a way that a single top-level instruction's pcode is parsed and executed at a time (e.g. relies on relative jumps within an instruction working). This causes problems for CALLOTHER semantics, which can be expanded into injected pcode. However, conceptually it seems like the emulator should "just work" with existing PcodeInjectLibrary code. Can some adapter be made for that?

Or, perhaps the inverse should be done: use pcode defined for the userops in the emulator to fill pcode on the paths currently using pcode injection for sleigh userops. In either case, it seems like this pcode modelling could be uniform.

Jun 25 '24 05:06 shuffle2

I thought about this a bit when designing the thing, but there's a disconnect between p-code injection for static analysis and for dynamic analysis. The injects in the pspecs, cspecs, etc., are often meant to simplify the static analysis and allow cleaner decompilation, e.g., overriding alloca_probe or the stack cookie checker. There are some cases, e.g., segment, where the existing inject library may make some sense in the dynamic case as well. That being said, there's only one flag on Instruction.getPcodeOps(boolean) to determine whether injections are taken or not, so I can't really pick and choose. Instead, I opted to never take injects and use a different mechanism for handling CALLOTHERs in the emulator.

I instead specified PcodeUseropLibrary, which at its core is simply a callback into Java code when the emulator encounters a CALLOTHER. It has some sugar if you'd rather model a userop using Sleigh/p-code. Instead of injecting (inlining), the emulator effectively treats the userop's p-code as a subroutine. That said, you might be able to create a PcodeUseropLibrary that adapts an existing PcodeInjectLibrary. Or at the very least, if a desired inject library is relatively small, you could probably create the equivalent userop library relatively easily. (See module B4 in the Debugger Tutorial.)

Going back to the static vs dynamic use case, we're already coming across situations where we'd like to have different instruction semantics depending on the use case. Consider a vector op. A human might just like to see something like vectoradd in the decompiler, which might just be an opaque userop. However, the emulator would need the full precise p-code. If the slaspec favors the emulator, the decompiler is going to render a rather ugly loop. We're thinking about ways to resolve this, but until we have that figured out, we're not likely to also conflate the userop definition mechanism for the two use cases. So for the moment, for better or worse, they remain distinct.

Jun 27 '24 13:06 nsadeveloper789

So, I also just took a look at the referenced ticket, and yeah, that's a fun one. My suggestion would be to make an equivalent PcodeUseropLibrary for the emulator. You'd probably want to take a look at DefaultPcodeThread.PcodeEmulatiionLibrary as an example. Your case will obviously be a little more complex, and I'm not sure the fields you need are accessible where you need them, but this should give you a gist of how you might accomplish it. (You might just start by adding this code to PcodeEmulationLibrary, and then work out how to factor it independently.)

@PcodeUserop
public void ex9it(int imm9u) {
    PcodeFrame saved = thread.frame;
    // Seems like everything you need is in imm9u, but in case you need the original instruction:
    Instruction curInstr = thread.instruction;
    // I'll leave you to fill in this computation
    Address fetchAddr = ...;
    Instruction fetchInstr = thread.decoder.decodeInstruction(fetchAddr, thread.getContext());
    // Do your validation. Throw a Java exception if the hardware would except. I'd recommend creating your own exception class extending PcodeExecutionException.
    thread.executor.execute(PcodeProgram.fromInstruction(fetchInstr), thread.getUseropLibrary());
    thread.frame = saved;
}

Jun 27 '24 13:06 nsadeveloper789

Thanks, that's a good start for sure. First, I tried this:

@PcodeUserop
public void ex9it(T imm9u) {
    PcodeFrame saved = thread.frame;
    
    // Get current ITB value
    long itb = thread.arithmetic.toLong(
      thread.getState().getVar(thread.language.getRegister("ITB"), Reason.EXECUTE_DECODE),
      Purpose.DECODE);
    
    // Compute address to fetch from
    long memOffset = (itb & ~0b11) + thread.arithmetic.toLong(imm9u, Purpose.DECODE) * 4;
    Address fetchAddr = thread.language.getAddressFactory().getAddress(
      thread.language.getAddressFactory().getDefaultAddressSpace().getSpaceID(), memOffset);

    // TODO throw if fetch/decode fails
    Instruction fetchInstr = thread.decoder.decodeInstruction(fetchAddr, thread.getContext());
    // pc-relative branch instructions in Instruction_Table always branch to same target, no matter where EX9.IT or ITB is.
    // other instructions which have pc-relative references are not affected.
    FlowType flowType = fetchInstr.getFlowType();
    if (flowType.isJump() || flowType.isCall()) {
      // XXX this doesn't work as intended:
      // * the instruction is still decoded as if it exists at fetchAddr
      // * registers set based on pc-relative value (e.g. Link Pointer when executing JAL out of the table)
      //   get set to fetchAddr+4 instead of thread.instruction.getAddress()+4
      // The first point above means that non-branch insns with pc-relative reference are also decoded incorrectly
      // (they'll be relative to fetchAddr instead of thread.instruction)
      thread.executor.executeSleigh("PC = PC & 0xfe000000;");
      fetchInstr = thread.decoder.decodeInstruction(fetchAddr, thread.getContext());
    }
    
    // Do your validation. Throw a Java exception if the hardware would except.
    // I'd recommend creating your own exception class extending PcodeExecutionException.
    if (fetchInstr.getMnemonicString().equals("EX9.IT")) {
      // TODO throw Reserved Instruction Exception
    }
    // TODO currently, the language does not implement any exceptions (e.g. alignment)
    // besides explicit ones like syscall/trap.
    thread.executor.execute(PcodeProgram.fromInstruction(fetchInstr), thread.getUseropLibrary());
    thread.frame = saved;
}

(see the XXX for what's broken)

I also gave this a try, but it fails as thread.instruction has no program assigned:

try {
thread.executor.execute(
  PcodeProgram.fromInject(thread.instruction.getProgram(), "ex9it", InjectPayload.CALLOTHERFIXUP_TYPE),
  thread.getUseropLibrary());
} catch (Exception ex) {
  throw new PcodeExecutionException(ex.getMessage());
}

I assume I should make some hacked up version of SleighInstructionDecoder.decodeInstruction to fix the above issues?

btw, is there a faster way to iterate testing changes to core ghidra? Currently I do gradle assembleAll -x ip -x createJavadocs && %GHIDRA_INSTALL_DIR%\ghidraRun.bat but it's pretty slow/processing a lot of stuff that isn't necessary.

Jun 27 '24 17:06 shuffle2

Now I'm wondering if it may be nicer to use a contextreg to select how pc-relative addresses are computed for branches in the slaspec. In the PcodeUserop implementation, I could set the contextreg and then decode the instruction from the table. However, I tried to do something similar to that already for the pcode injection side, and couldn't get it to work as expected.

Jun 27 '24 17:06 shuffle2

yea...I'd think adding this would work, but it doesn't:

thread.overrideContext(new RegisterValue(thread.contextreg, BigInteger.valueOf(1)));
fetchInstr = thread.decoder.decodeInstruction(fetchAddr, thread.getContext());

and modifying sleigh like:

define context contextreg
  itMode=(0,0)
;
imm24s_rel: rel is s0_23 & itMode=0 [ rel = inst_start + (s0_23 << 1); ] { export *:4 rel; }
imm24s_rel: rel is s0_23 & itMode=1 [ rel = (PC & 0xfe000000) + (s0_23 << 1); ] { export *:4 rel; }
pc_next: is itMode=0 { pcrel = PC + 4; export pcrel; }
pc_next: is itMode=1 { pcrel = PC + 2; export pcrel; }
:JAL imm24s_rel is u24_24=1 & imm24s_rel & pc_next {
    set_link_gpr(lp, pc_next);
    psw_ifcon_clear();
    call imm24s_rel;
}

The result is that lp is set to PC +4 (at least it's not inst_next anymore), and the jump target is still the wrong address. So the decode isn't respecting the contextreg override.

edit: oh, I take that back, it does work - it's just that the contextreg endian is inverted from what I expected, or something. Setting it to 0xffffffff instead of 1 did trigger itMode=1 patterns to be matched.

There's still some weirdness: the emulator winds up with PC 4 past the jump destination when executing a JAL via EX9.IT (the decompiler shows the correct target location, though). Probably something to do with how the emulator increments PC after executing an instruction? In any case, very close to it working now :)

edit2: from looking at the pcode stepper, it looks like the extra 4 byte PC advance is from the "fall-through" which is executed after the CALLOTHER completes. I wonder if there's a way to override that, or will I need to kludge a PC -= 4 into the emulator to compensate? I also wonder if other parts of ghidra are having the same issue with this CALLOTHER when tracing flow through injected pcode.

Jun 27 '24 21:06 shuffle2

Unfortunately, the above doesn't work in the disassembler/decompiler, because PC register is always value 0.

I've tried quite a few things to work around that deficiency while maintaining emulator functionality, and it seems infeasible (in sleigh at least).

Back to the drawing board.

Jun 28 '24 14:06 shuffle2

So, you don't need to override the emulator's context. In fact, you probably want to leave it in 16-bit mode, so it can continue in that mode once it has executed the 32-bit instruction. Instead just pass the custom context directly to the decoder:

Register itMode = thread.getLanguage().getRegister("itMode");
RegisterValue defaultCtx =
thread.defaultContext.getDefaultValue(thread.contextreg, fetchAddr);
// I'm assumine itMode=1 implies 32-bit instructions?
RegisterValue ctxMode32 = defaultCtx.assign(itMode, BigInteger.ONE);
Instruction fetchInstr = thread.decoder.decodeInstruction(fetchAddr, ctxMode32);

Jun 28 '24 15:06 nsadeveloper789

As for the wrapping the inject option, missing the program could be a hard stop. The only way I can think of getting one of those in there is to factor your op out into its own library, and then pass the program into its constructor. Probably not worth going down that avenue, yet.

Jun 28 '24 15:06 nsadeveloper789

As for PC being off, I haven't examined carefully, but since this involves execution of a second decoded instruction by reference, some of our usual conventions get broken (this is not something we've had to deal with before.) inst_next is effectively hardcoded into an instruction's p-code, and it's based on the address of that instruction. So, if you use inst_next in the JAL, it's going to refer to the instruction following the JAL, not the instruction after the EX9.IT that refers to it. That's unfortunate, because that's the convention we use everywhere we want PC-relative anything. There may be a way to work around this by reconstructing the fetched instruction as if it were at the EX9.IT's address:

Instruction reloced = new PseudoInstruction(thread.counter, fetchInstr.getPrototype(), fetchInstr, fetchInstr);

Then used reloced instead of the fetchInstr for the executor. No guarantees that's sane, but with some tweaking, it should work, and allow you to use inst_next in the conventional way.

Also, I forgot you asked earlier:

btw, is there a faster way to iterate testing changes to core ghidra?

We recommend using JUnit from Eclipse. You'd probably want to add yours to BytesTracePcodeEmulatorTest, at least to start.

Jun 28 '24 15:06 nsadeveloper789

So, you don't need to override the emulator's context. In fact, you probably want to leave it in 16-bit mode, so it can continue in that mode once it has executed the 32-bit instruction. Instead just pass the custom context directly to the decoder:
Register itMode = thread.getLanguage().getRegister("itMode");
RegisterValue defaultCtx =
thread.defaultContext.getDefaultValue(thread.contextreg, fetchAddr);
// I'm assumine itMode=1 implies 32-bit instructions?
RegisterValue ctxMode32 = defaultCtx.assign(itMode, BigInteger.ONE);
Instruction fetchInstr = thread.decoder.decodeInstruction(fetchAddr, ctxMode32);

itMode means "currently decoding the instruction referenced by an EX9.IT instruction", a condition which currently can only happen when the emulator is driving execution. In the above code, getDefaultValue returned null, so I replaced with

Register itMode = thread.getLanguage().getRegister("itMode");
RegisterValue itCtxMode = thread.context.assign(itMode, BigInteger.ONE);
Instruction fetchInstr = thread.decoder.decodeInstruction(fetchAddr, itCtxMode);

I think this still has the intended effect of not permanently overriding the context. I suppose the default value would be populated if I set it in <tracked_set>? Does seem odd that contextreg isn't defaulted to zero (which seems to be the expectation for sleigh code, which cannot explicitly initialize it).

Instruction reloced = new PseudoInstruction(thread.counter, fetchInstr.getPrototype(), fetchInstr, fetchInstr);

This works great, thanks a lot!

I'll have to see if these ideas can be applied to the pcode injection side, too.

Here's how I'm working around the PC advancement for now. Kludgy but at least it's working :)

PcodeFrame frame = thread.executor.execute(PcodeProgram.fromInstruction(reloced), thread.getUseropLibrary());
// compensate for the emulator advancing pc
// for whatever reason, the fallthrough adds 2, and external branch adds 4
thread.writeCounter(thread.counter.subtract(frame.isFallThrough() ? 2 : 4));

Jun 28 '24 20:06 shuffle2

hum, I was thinking this could just be thrown into something extending EmulateInstructionStateModifier (which is found via emulateInstructionStateModifierClass in pspec), but when I went back to look - Emulate seems to be an entirely duplicated emulator unrelated to PcodeExecutor?

Jun 28 '24 22:06 shuffle2

You can change the default context values in the pspec file. Not sure that's relevant. Register value bits do have three states: 0, 1, and unspecified (when mask==0). Not sure that resolves or adds to the confusion, so feel free to ignore since it looks like you have things working.

Regarding Emulate vs PcodeEmulator, yes. These are two different implementations of a p-code emulator. Emulate is actually the older one. It supports some processor-specific plugins, which are implemented via the EmulateInstructionStateModifer mechanism. PcodeEmulator is newer, and it based on a more flexible and modular framework. (PcodeExecutor is just the p-code interpretation component.) However, to maintain support for processors requiring these plug-ins, it has some "glue" to adapt the state modifiers into the new framework.

Jul 01 '24 12:07 nsadeveloper789

I suppose I should read a little more deeply into your question. You're now trying to figure out how to make this into a contributable component, as opposed to something kludged into DefaultPcodeThread? If that's the case, take a look at the DebuggerTutorial, module B4, GUI Integration: https://github.com/NationalSecurityAgency/ghidra/blob/master/GhidraDocs/GhidraClass/Debugger/B4-Modeling.md#gui-integration. Since your userop library won't be in the base emulation package, you may or may not have access to the same fields you did while you were experimenting in DefaultPcodeThread. We can discuss the addition of getter methods on a case-by-base basis, if necessary. Often there are other ways to get at the thing you need.

We haven't yet worked out what a more automatic plugin/configuration system would look like in the GUI. For now, the idea is that the user executes a script (could be packaged in Ghidra, provided by a 3rd-party extension, or just downloaded). That script does like the one developed in the tutorial and just tells the emulation service to use a customized emulator provided by that script.

We're considering an annotation mechanism on the library with some language and/or processor ID to indicate that the service should automatically add it. While that seems simple, there's a whole ongoing story about environment modeling. The processor is only one piece of that. We'd also like to use the userop libraries to handle system calls, which is modeling the operating system, so there's a lot for us to consider. In the meantime, the scripting thing is what we have.

Jul 01 '24 12:07 nsadeveloper789

I think this brings us back to the original topic.

Perhaps a fundamental point is that there are some cases when a processor language developer wants to eschew the limitations of sleigh and have more control of the emitted pcode. As in this EX9.IT case which seems to be behavior that is inexpressible in sleigh. However, I have encountered other times when the decoding/behavior would just be extremely annoying to write / figure out how to write in sleigh, and I'd like to progress with initial disassembly of a binary (e.g. to figure out which ISA is used / which instructions are required to be implemented, etc). So far I've been using dynamic pcode injection as this type of escape hatch from sleigh, although that is not really the intent of pcode injection feature. Even though I like the idea of creating such a new, dedicated escape hatch (in which the processor extension can then implement the instruction semantics in a single place) - upon reflection, I guess sleigh should somehow be expanded instead, such that the instruction decoding and behavioral semantics don't start leaking out all over the place.

So, I guess my original request is invalid because I'm abusing CALLOTHER to implement semantics which really should be in sleigh(?)

Jul 01 '24 21:07 shuffle2

Such is the lifecycle of software....

So, this goes a little beyond my domain, as I'm not the maintainer of Sleigh itself, but of the p-code emulator that consumes it. I can't help but agree that CALLOTHER is not the right solution here, because that's (at least I'd argue) a semantic mechanism; whereas, what you're doing needs a change to the decoding mechanism. While I don't know the complete history, Sleigh has had its mutations and hacks added over time. IINM, the context register is among those additions. It wouldn't be unreasonable to add a mechanism that instructs the parser to begin consuming bytes at some other target address. (At the moment, you can advance a token cursor, using ;, but that's all relative to the current PC and only very small forward steps.) I don't have a good answer for you, but if you care to poke around on your own, I'd suggest some design questions:

What should the disassembly of the EX9.IT instruction look like? Usually, we examine the output of other disassemblers to help answer this. Should the display incorporate the ref'ed instruction in any way?
Presuming the solution does not use CALLOTHER, or otherwise (ab)use the Sleigh semantic block, what should the Sleigh code look like?

Jul 08 '24 12:07 nsadeveloper789