PPC: Include thunk pattern
Is your feature request related to a problem? Please describe. The pattern file does not include any thunk logic. The MIPS pattern has a pattern that will match on a thunk which looks to mirror the PPC thunks I'm seeing.
Describe the solution you'd like
Add a <pattern> that includes <funcstart validcode="function" thunk="true"> to create thunks (in addition to just helping to find function starts)
Describe alternatives you've considered manually set as thunked function
Additional context Right now I'm seeing this sequence (regardless number of arguments)
94 21 ff f8 stwu r1,-0x8(r1)
7c 08 02 a6 mfspr r0,LR
90 01 00 0c stw r0,0xc(r1)
xx xx xx xx bl LABEL // OP= 0b010010 LK=1
80 01 00 0c lwz r0,0xc(r1)
38 21 00 08 addi r1,r1,0x8
7c 08 03 a6 mtspr LR,r0
4e 80 00 20 blr
the strictest pattern I think would be:
<pattern>
<data> 0x9421fff8 0x7c0802a6 0x9001000c 010010.. 0x.... .......1 0x8001000c 0x38210008 0x7c0803a6 0x4e800020 </data> <!-- PPC Thunk -->
<funcstart validcode="function" thunk="true"/>
</pattern>
Another pattern I'm seeing is:
lis rN, %hi(addr)
addi rN,rN,%lo(addr)
mtspr CTR,rN
bctr
I wondering if that's okay since its not a bctrl, or would that trigger too many false positives? The previous thunk isn't really a question of register usage, this one could likely be any volatile register (but maybe safe to assume its the same register through out, not that matters for a pattern xml file though unless you made one for each "volatile" register). This one does get picked up most of the time by other analysis, just including as those are the only two thunks I'm seeing.
java.lang.NullPointerException
at ghidra.app.cmd.function.CreateThunkFunctionCmd.resolveComputableFlow(CreateThunkFunctionCmd.java:423)
at ghidra.app.cmd.function.CreateThunkFunctionCmd.getReferencedFunction(CreateThunkFunctionCmd.java:281)
at ghidra.app.cmd.function.CreateThunkFunctionCmd.applyTo(CreateThunkFunctionCmd.java:169)
at ghidra.framework.cmd.BackgroundCommand.applyTo(BackgroundCommand.java:51)
at ghidra.app.analyzers.FunctionStartAnalyzer$FunctionStartAction.applyActionToSet(FunctionStartAnalyzer.java:302)
at ghidra.app.analyzers.FunctionStartAnalyzer$FunctionStartAction.apply(FunctionStartAnalyzer.java:211)
at ghidra.util.bytesearch.MemoryBytePatternSearcher.searchBlock(MemoryBytePatternSearcher.java:240)
at ghidra.util.bytesearch.MemoryBytePatternSearcher.search(MemoryBytePatternSearcher.java:128)
at ghidra.app.analyzers.FunctionStartAnalyzer.added(FunctionStartAnalyzer.java:698)
at ghidra.app.analyzers.FunctionStartFuncAnalyzer.added(FunctionStartFuncAnalyzer.java:52)
at ghidra.app.plugin.core.analysis.AnalysisScheduler.runAnalyzer(AnalysisScheduler.java:186)
at ghidra.app.plugin.core.analysis.AnalysisTask.applyTo(AnalysisTask.java:39)
at ghidra.app.plugin.core.analysis.AutoAnalysisManager$AnalysisTaskWrapper.run(AutoAnalysisManager.java:688)
at ghidra.app.plugin.core.analysis.AutoAnalysisManager.startAnalysis(AutoAnalysisManager.java:788)
at ghidra.app.plugin.core.analysis.AutoAnalysisManager.startAnalysis(AutoAnalysisManager.java:667)
at ghidra.app.plugin.core.analysis.AutoAnalysisManager.startAnalysis(AutoAnalysisManager.java:632)
at ghidra.app.plugin.core.analysis.AnalysisBackgroundCommand.applyTo(AnalysisBackgroundCommand.java:58)
at ghidra.framework.plugintool.mgr.BackgroundCommandTask.run(BackgroundCommandTask.java:102)
at ghidra.framework.plugintool.mgr.ToolTaskManager.run(ToolTaskManager.java:336)
at java.base/java.lang.Thread.run(Thread.java:829)
---------------------------------------------------
Build Date: 2022-Jun-21
Ghidra Version: 10.2
Java Home: /usr/lib/jvm/java-11-openjdk-amd64
JVM Version: Private Build 11.0.15
OS: Linux 5.14.0-1042-oem amd64
d2883bbb8c9af943625e8d55e67452765056d7eb
Just tried implementing the pattern and ran into this NPE, coming from this return null;
https://github.com/NationalSecurityAgency/ghidra/blob/da94eb86bd2b89c8b0ab9bd89e9f0dc5a3157055/Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/program/model/block/SimpleBlockModel.java#L354
You shouldn't get that exception. I can take a look. I also suspect the logic won't compute the thunked function since there is follow on code after the thunk. We might need to change the code to allow that as well, if it is going to be forced to a thunk by the function pattern.
The second pattern above could be very problematic. Especially if it is found in the middle of code that hasn't been disassembled yet for some reason. It sounds like it would get handled correctly as a thunk if it is found through normal disassembly. Correct?
We have some other patterns we need to add because they have too many side-effects to do automatically, similar to the one above.
Do you have some sample bytes you could share for each thunk?
Thanks for taking a look @emteere . I'll update tonight with more info. I will say it only happens with queued functions to disassemble/create function, once everything is code the exception did go away, but the thunk wasn't created as you're thinking. I have some suggestions for additional patterns and I'll post those too
48 00 00 fc 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 60 00 00 00 94 21 ff f8 7c 08 02 a6 90 01 00 0c 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 42 80 00 51 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 42 80 00 15 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 7c 63 22 14 7c 63 2a 14 7c 63 32 14 7c 63 3a 14 7c 63 42 14 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 38 60 00 01 38 80 00 01 38 a0 00 01 38 c0 00 01 38 e0 00 01 39 00 00 01 42 80 ff ad 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 42 80 ff 71 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20 94 21 ff f8 7c 08 02 a6 90 01 00 0c 42 80 ff 35 80 01 00 0c 7c 08 03 a6 38 21 00 08 4e 80 00 20
probably overkill, just added a few empty funcs/filler and thunks for a int sum(a,b,c,d,e,f)
I have default analyzers on press disassemble on the first instruction (I loaded at 0x04000000, but PIC) and hits that exception
Here is my current diff on cdbf92698386df1d08610473b0a5369f7cda7077 . Whitespace/alignment accounts for a few lines, but the big addition is having mfspr lr, r0 before the stwu. An image I had been messing with a while was at 300k and these additional pre/post patterns brought it up to 350k functions I think.
https://gist.github.com/mumbel/4f77c182c3e1d3c93372c00bac4a83ad
edit: I only have the one thunk style that's causing the exception. If there's doubt about my second example being too broad, that's fine since auto-analysis recognizes it as a thunk once its a function (and even if this thunk style isn't doable, I was able to search bytes->mark as thunk, but at this point just interesting it raised the exception).
Was thinking about the first function calling another function then doing a normal return is a problematic thunk pattern. This could occur in any normal program. In any given program there are probably many examples of one function calling another with the same args or no args on caller/callee. We do have some support along these lines, were you can thunk another function and the thunking function and thunkee have a different name. What happens is just the parameters are pulled from the thunkee. I suppose with that thinking we could allow this function in general to be autoidentified as a thunk as long as the name on the thunking function can change later. RIght now we try to figure out that there are no side-effects from the function calling the other function, like adding an argument before calling or jumping to the other function. We currently don't see a call to another function as a thunk.
@mumbel if you get a chance, give the following change a try.
There was some additional logic needed to force the call to be the target of the thunk.
I think I agree with adding the thunk which just calls the other function. The thunk could be recognized as just a call and no other side-effects, but it could be problematic, so this is easier. We could do the automatic recognition later.
If it works for you, I can change the logic and if you would like to contribute the patterns, otherwise, I can make the change and add the pattern modifications.
diff --git a/Ghidra/Features/Base/src/main/java/ghidra/app/cmd/function/CreateThunkFunctionCmd.java b/Ghidra/Features/Base/src/main/java/ghidra/app/cmd/function/CreateThunkFunctionCmd.java
index 8846dd6..c4f08ea 100644
--- a/Ghidra/Features/Base/src/main/java/ghidra/app/cmd/function/CreateThunkFunctionCmd.java
+++ b/Ghidra/Features/Base/src/main/java/ghidra/app/cmd/function/CreateThunkFunctionCmd.java
@@ -16,7 +16,10 @@
package ghidra.app.cmd.function;
import java.math.BigInteger;
-import java.util.*;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import ghidra.app.util.PseudoDisassembler;
@@ -26,6 +29,9 @@
import ghidra.program.model.address.*;
import ghidra.program.model.block.BasicBlockModel;
import ghidra.program.model.block.CodeBlock;
+import ghidra.program.model.block.CodeBlockReference;
+import ghidra.program.model.block.CodeBlockReferenceIterator;
+import ghidra.program.model.block.SimpleBlockModel;
import ghidra.program.model.lang.Register;
import ghidra.program.model.lang.RegisterValue;
import ghidra.program.model.listing.*;
@@ -257,7 +263,7 @@
TaskMonitor monitor) {
Listing listing = program.getListing();
-
+
if (referencedSymbol != null) {
Object obj = referencedSymbol.getObject();
if (obj instanceof Function) {
@@ -294,6 +300,11 @@
referencedFunctionAddr =
PseudoDisassembler.getNormalizedDisassemblyAddress(program, referencedFunctionAddr);
}
+
+ // if it is still no thunkAddr, grab the first basic block and the call/jump at the end of the block
+ if (referencedFunctionAddr == null && autoThunkOK) {
+ referencedFunctionAddr = getFirstBlockJumpCall(program, monitor);
+ }
if (referencedFunctionAddr == null) {
setStatusMsg(
@@ -363,6 +374,36 @@
return null;
}
+ /**
+ * Get the first blocks unconditional jump or call destination
+ *
+ * @param program program
+ * @param monitor monitor for potentially long operation
+ * @return first blocks unconditional call/jump, null if none in first block
+ */
+ private Address getFirstBlockJumpCall(Program program, TaskMonitor monitor) {
+ SimpleBlockModel simpleBlockModel = new SimpleBlockModel(program);
+
+ try {
+ CodeBlock codeBlockAt = simpleBlockModel.getCodeBlockAt(entry, monitor);
+ if (codeBlockAt == null) {
+ return null;
+ }
+ CodeBlockReferenceIterator destinations = codeBlockAt.getDestinations(monitor);
+ while (destinations.hasNext()) {
+ CodeBlockReference destRef = destinations.next();
+ FlowType flowType = destRef.getFlowType();
+ if ((flowType.isCall() || flowType.isJump()) && !flowType.isUnConditional()) {
+ return destRef.getDestinationAddress();
+ }
+ }
+ } catch (CancelledException e) {
+ // ignore
+ }
+
+ return null;
+ }
+
private Function getExternalFunction(Program program) {
ExternalManager externalMgr = program.getExternalManager();
@@ -406,20 +447,19 @@
// get the basic block
//
- // NOTE: Assumption, target addres must be computable in single flow, or else isn't a thunk
+ // NOTE: Assumption, target address must be computable in single flow, or else isn't a thunk
BasicBlockModel basicBlockModel = new BasicBlockModel(program);
-
final CodeBlock jumpBlockAt =
basicBlockModel.getFirstCodeBlockContaining(location, monitor);
- // If the jump target can has a computable target with only the instructions in the basic block it is found in
- // then it isn't a switch statment
- //
- // NOTE: Assumption, we have found all flows leading to the switch that might split the basic block
+ if (jumpBlockAt == null) {
+ return false;
+ }
final AtomicInteger foundCount = new AtomicInteger(0);
SymbolicPropogator prop = new SymbolicPropogator(program);
+ // try to compute the thunk by flowing constants from the start of the block
prop.flowConstants(jumpBlockAt.getFirstStartAddress(), jumpBlockAt,
new ContextEvaluatorAdapter() {
@Override
@emteere I'll hopefully have time tonight to give it a try, and please if you can add that diff internally please do, no need for a PR if you're okay adding it.
@emteere Looks good. I did verify it still crashed with 0b95d8ee876f1f8e2c445ecfdbc1abd965415e17 and then applied your patch