ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

PPC: Include thunk pattern

Open mumbel opened this issue 3 years ago • 8 comments

Is your feature request related to a problem? Please describe. The pattern file does not include any thunk logic. The MIPS pattern has a pattern that will match on a thunk which looks to mirror the PPC thunks I'm seeing.

Describe the solution you'd like Add a <pattern> that includes <funcstart validcode="function" thunk="true"> to create thunks (in addition to just helping to find function starts)

Describe alternatives you've considered manually set as thunked function

Additional context Right now I'm seeing this sequence (regardless number of arguments)

94 21 ff f8    stwu    r1,-0x8(r1)
7c 08 02 a6    mfspr   r0,LR
90 01 00 0c    stw     r0,0xc(r1)
xx xx xx xx    bl      LABEL // OP= 0b010010 LK=1
80 01 00 0c    lwz     r0,0xc(r1)
38 21 00 08    addi    r1,r1,0x8
7c 08 03 a6    mtspr   LR,r0
4e 80 00 20    blr

the strictest pattern I think would be:

  <pattern> 
      <data> 0x9421fff8 0x7c0802a6 0x9001000c 010010.. 0x.... .......1 0x8001000c 0x38210008 0x7c0803a6 0x4e800020 </data> <!-- PPC Thunk -->
      <funcstart validcode="function" thunk="true"/>
  </pattern>

Another pattern I'm seeing is:

lis     rN, %hi(addr)
addi    rN,rN,%lo(addr)
mtspr   CTR,rN
bctr

I wondering if that's okay since its not a bctrl, or would that trigger too many false positives? The previous thunk isn't really a question of register usage, this one could likely be any volatile register (but maybe safe to assume its the same register through out, not that matters for a pattern xml file though unless you made one for each "volatile" register). This one does get picked up most of the time by other analysis, just including as those are the only two thunks I'm seeing.

mumbel avatar Jun 21 '22 19:06 mumbel

java.lang.NullPointerException
	at ghidra.app.cmd.function.CreateThunkFunctionCmd.resolveComputableFlow(CreateThunkFunctionCmd.java:423)
	at ghidra.app.cmd.function.CreateThunkFunctionCmd.getReferencedFunction(CreateThunkFunctionCmd.java:281)
	at ghidra.app.cmd.function.CreateThunkFunctionCmd.applyTo(CreateThunkFunctionCmd.java:169)
	at ghidra.framework.cmd.BackgroundCommand.applyTo(BackgroundCommand.java:51)
	at ghidra.app.analyzers.FunctionStartAnalyzer$FunctionStartAction.applyActionToSet(FunctionStartAnalyzer.java:302)
	at ghidra.app.analyzers.FunctionStartAnalyzer$FunctionStartAction.apply(FunctionStartAnalyzer.java:211)
	at ghidra.util.bytesearch.MemoryBytePatternSearcher.searchBlock(MemoryBytePatternSearcher.java:240)
	at ghidra.util.bytesearch.MemoryBytePatternSearcher.search(MemoryBytePatternSearcher.java:128)
	at ghidra.app.analyzers.FunctionStartAnalyzer.added(FunctionStartAnalyzer.java:698)
	at ghidra.app.analyzers.FunctionStartFuncAnalyzer.added(FunctionStartFuncAnalyzer.java:52)
	at ghidra.app.plugin.core.analysis.AnalysisScheduler.runAnalyzer(AnalysisScheduler.java:186)
	at ghidra.app.plugin.core.analysis.AnalysisTask.applyTo(AnalysisTask.java:39)
	at ghidra.app.plugin.core.analysis.AutoAnalysisManager$AnalysisTaskWrapper.run(AutoAnalysisManager.java:688)
	at ghidra.app.plugin.core.analysis.AutoAnalysisManager.startAnalysis(AutoAnalysisManager.java:788)
	at ghidra.app.plugin.core.analysis.AutoAnalysisManager.startAnalysis(AutoAnalysisManager.java:667)
	at ghidra.app.plugin.core.analysis.AutoAnalysisManager.startAnalysis(AutoAnalysisManager.java:632)
	at ghidra.app.plugin.core.analysis.AnalysisBackgroundCommand.applyTo(AnalysisBackgroundCommand.java:58)
	at ghidra.framework.plugintool.mgr.BackgroundCommandTask.run(BackgroundCommandTask.java:102)
	at ghidra.framework.plugintool.mgr.ToolTaskManager.run(ToolTaskManager.java:336)
	at java.base/java.lang.Thread.run(Thread.java:829)

---------------------------------------------------
Build Date: 2022-Jun-21
Ghidra Version: 10.2
Java Home: /usr/lib/jvm/java-11-openjdk-amd64
JVM Version: Private Build 11.0.15
OS: Linux 5.14.0-1042-oem amd64

d2883bbb8c9af943625e8d55e67452765056d7eb Just tried implementing the pattern and ran into this NPE, coming from this return null; https://github.com/NationalSecurityAgency/ghidra/blob/da94eb86bd2b89c8b0ab9bd89e9f0dc5a3157055/Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/program/model/block/SimpleBlockModel.java#L354

mumbel avatar Jun 22 '22 01:06 mumbel

You shouldn't get that exception. I can take a look. I also suspect the logic won't compute the thunked function since there is follow on code after the thunk. We might need to change the code to allow that as well, if it is going to be forced to a thunk by the function pattern.

The second pattern above could be very problematic. Especially if it is found in the middle of code that hasn't been disassembled yet for some reason. It sounds like it would get handled correctly as a thunk if it is found through normal disassembly. Correct?

We have some other patterns we need to add because they have too many side-effects to do automatically, similar to the one above.

Do you have some sample bytes you could share for each thunk?

emteere avatar Jul 19 '22 15:07 emteere

Thanks for taking a look @emteere . I'll update tonight with more info. I will say it only happens with queued functions to disassemble/create function, once everything is code the exception did go away, but the thunk wasn't created as you're thinking. I have some suggestions for additional patterns and I'll post those too

mumbel avatar Jul 19 '22 15:07 mumbel

48 00 00 fc 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 94 21 ff f8 7c 08 02 a6 90 01 00 0c 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 42 80 00 51 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 42 80 00 15 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 7c 63 22 14 7c 63 2a 14 7c 63 32 14 7c 63 3a 14 7c 63 42 14 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 38 60 00 01 38 80 00 01 38 a0 00 01 38 c0 00 01 38 e0 00 01 39 00 00 01 42 80 ff ad 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 42 80 ff 71 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 42 80 ff 35 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20

probably overkill, just added a few empty funcs/filler and thunks for a int sum(a,b,c,d,e,f)

I have default analyzers on press disassemble on the first instruction (I loaded at 0x04000000, but PIC) and hits that exception

Here is my current diff on cdbf92698386df1d08610473b0a5369f7cda7077 . Whitespace/alignment accounts for a few lines, but the big addition is having mfspr lr, r0 before the stwu. An image I had been messing with a while was at 300k and these additional pre/post patterns brought it up to 350k functions I think. https://gist.github.com/mumbel/4f77c182c3e1d3c93372c00bac4a83ad

edit: I only have the one thunk style that's causing the exception. If there's doubt about my second example being too broad, that's fine since auto-analysis recognizes it as a thunk once its a function (and even if this thunk style isn't doable, I was able to search bytes->mark as thunk, but at this point just interesting it raised the exception).

mumbel avatar Jul 20 '22 02:07 mumbel

Was thinking about the first function calling another function then doing a normal return is a problematic thunk pattern. This could occur in any normal program. In any given program there are probably many examples of one function calling another with the same args or no args on caller/callee. We do have some support along these lines, were you can thunk another function and the thunking function and thunkee have a different name. What happens is just the parameters are pulled from the thunkee. I suppose with that thinking we could allow this function in general to be autoidentified as a thunk as long as the name on the thunking function can change later. RIght now we try to figure out that there are no side-effects from the function calling the other function, like adding an argument before calling or jumping to the other function. We currently don't see a call to another function as a thunk.

emteere avatar Jul 20 '22 15:07 emteere

@mumbel if you get a chance, give the following change a try.

There was some additional logic needed to force the call to be the target of the thunk.

I think I agree with adding the thunk which just calls the other function. The thunk could be recognized as just a call and no other side-effects, but it could be problematic, so this is easier. We could do the automatic recognition later.

If it works for you, I can change the logic and if you would like to contribute the patterns, otherwise, I can make the change and add the pattern modifications.

diff --git a/Ghidra/Features/Base/src/main/java/ghidra/app/cmd/function/CreateThunkFunctionCmd.java b/Ghidra/Features/Base/src/main/java/ghidra/app/cmd/function/CreateThunkFunctionCmd.java
index 8846dd6..c4f08ea 100644
--- a/Ghidra/Features/Base/src/main/java/ghidra/app/cmd/function/CreateThunkFunctionCmd.java
+++ b/Ghidra/Features/Base/src/main/java/ghidra/app/cmd/function/CreateThunkFunctionCmd.java
@@ -16,7 +16,10 @@
 package ghidra.app.cmd.function;
 
 import java.math.BigInteger;
-import java.util.*;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
 import java.util.concurrent.atomic.AtomicInteger;
 
 import ghidra.app.util.PseudoDisassembler;
@@ -26,6 +29,9 @@
 import ghidra.program.model.address.*;
 import ghidra.program.model.block.BasicBlockModel;
 import ghidra.program.model.block.CodeBlock;
+import ghidra.program.model.block.CodeBlockReference;
+import ghidra.program.model.block.CodeBlockReferenceIterator;
+import ghidra.program.model.block.SimpleBlockModel;
 import ghidra.program.model.lang.Register;
 import ghidra.program.model.lang.RegisterValue;
 import ghidra.program.model.listing.*;
@@ -257,7 +263,7 @@
 			TaskMonitor monitor) {
 
 		Listing listing = program.getListing();
-
+		
 		if (referencedSymbol != null) {
 			Object obj = referencedSymbol.getObject();
 			if (obj instanceof Function) {
@@ -294,6 +300,11 @@
 			referencedFunctionAddr =
 				PseudoDisassembler.getNormalizedDisassemblyAddress(program, referencedFunctionAddr);
 		}
+		
+		// if it is still no thunkAddr, grab the first basic block and the call/jump at the end of the block
+		if (referencedFunctionAddr == null && autoThunkOK) {
+			referencedFunctionAddr = getFirstBlockJumpCall(program, monitor);
+		}
 
 		if (referencedFunctionAddr == null) {
 			setStatusMsg(
@@ -363,6 +374,36 @@
 		return null;
 	}
 
+	/**
+	 * Get the first blocks unconditional jump or call destination
+	 * 
+	 * @param program program
+	 * @param monitor monitor for potentially long operation
+	 * @return first blocks unconditional call/jump, null if none in first block
+	 */
+	private Address getFirstBlockJumpCall(Program program, TaskMonitor monitor) {
+		SimpleBlockModel simpleBlockModel = new SimpleBlockModel(program);
+		
+		try {
+			CodeBlock codeBlockAt = simpleBlockModel.getCodeBlockAt(entry, monitor);
+			if (codeBlockAt == null) {
+				return null;
+			}
+			CodeBlockReferenceIterator destinations = codeBlockAt.getDestinations(monitor);
+			while (destinations.hasNext()) {
+				CodeBlockReference destRef = destinations.next();
+				FlowType flowType = destRef.getFlowType();
+				if ((flowType.isCall() || flowType.isJump()) && !flowType.isUnConditional()) {
+					return destRef.getDestinationAddress();
+				}
+			}
+		} catch (CancelledException e) {
+			// ignore
+		}
+		
+		return null;
+	}
+
 	private Function getExternalFunction(Program program) {
 
 		ExternalManager externalMgr = program.getExternalManager();
@@ -406,20 +447,19 @@
 
 		// get the basic block
 		//
-		// NOTE: Assumption, target addres must be computable in single flow, or else isn't a thunk
+		// NOTE: Assumption, target address must be computable in single flow, or else isn't a thunk
 
 		BasicBlockModel basicBlockModel = new BasicBlockModel(program);
-
 		final CodeBlock jumpBlockAt =
 			basicBlockModel.getFirstCodeBlockContaining(location, monitor);
-		// If the jump target can has a computable target with only the instructions in the basic block it is found in
-		//  then it isn't a switch statment
-		//
-		// NOTE: Assumption, we have found all flows leading to the switch that might split the basic block
+		if (jumpBlockAt == null) {
+			return false;
+		}
 
 		final AtomicInteger foundCount = new AtomicInteger(0);
 		SymbolicPropogator prop = new SymbolicPropogator(program);
 
+		// try to compute the thunk by flowing constants from the start of the block
 		prop.flowConstants(jumpBlockAt.getFirstStartAddress(), jumpBlockAt,
 			new ContextEvaluatorAdapter() {
 				@Override

emteere avatar Sep 20 '22 03:09 emteere

@emteere I'll hopefully have time tonight to give it a try, and please if you can add that diff internally please do, no need for a PR if you're okay adding it.

mumbel avatar Sep 20 '22 14:09 mumbel

@emteere Looks good. I did verify it still crashed with 0b95d8ee876f1f8e2c445ecfdbc1abd965415e17 and then applied your patch

mumbel avatar Sep 20 '22 23:09 mumbel