ghidra
ghidra copied to clipboard
bsim cannot recover from OOM
Describe the bug
Was running bsim generatesigs <ghidra project directory> <elastic repo>
and it eventually just stopped taking up CPU (java
was running 900%+ in top
until then with lots of instances of decompiler
which eventually stopped spawning)
Have 5 firmware in the project directory and on the first one ran into issues.
Don't know if related, but had about 30 of these and then the stack traces
ERROR Error generating signature for "FUN_XXXXXXXX". Error: Exception while generating signatures: process: timeout
(SignatureTask)
Exception in thread "Decompiler Disposer-pool-4-thread-27" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679)
at java.base/java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:460)
at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1061)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
ERROR Unexpected exception getting Decompiler result (InternalResultListener) java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at generic.concurrent.QResult.<init>(QResult.java:40)
at generic.concurrent.FutureTaskMonitor.run(FutureTaskMonitor.java:78)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.base/java.lang.StringUTF16.compress(StringUTF16.java:161)
at java.base/java.lang.String.<init>(String.java:4501)
at java.base/java.lang.String.<init>(String.java:300)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:362)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:396)
at ghidra.app.decompiler.DecompInterface.fileToString(DecompInterface.java:247)
at ghidra.app.decompiler.DecompInterface.initializeProcess(DecompInterface.java:284)
at ghidra.app.decompiler.DecompInterface.verifyProcess(DecompInterface.java:358)
at ghidra.app.decompiler.DecompInterface.generateSignatures(DecompInterface.java:1045)
at ghidra.features.bsim.query.GenSignatures$SignatureTask.decompile(GenSignatures.java:692)
at ghidra.features.bsim.query.ParallelDecompileTask$ParallelDecompilerCallback.process(ParallelDecompileTask.java:124)
at ghidra.features.bsim.query.ParallelDecompileTask$ParallelDecompilerCallback.process(ParallelDecompileTask.java:110)
at generic.concurrent.ConcurrentQ$CallbackCallable.call(ConcurrentQ.java:658)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at generic.concurrent.FutureTaskMonitor.run(FutureTaskMonitor.java:76)
... 3 more
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "GTimer"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "process reaper"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Log4j2-TF-3-Scheduled-1"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Analysis-pool-3-thread-2"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Analysis-pool-3-thread-5"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Analysis-pool-3-thread-3"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Analysis-pool-3-thread-8"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Analysis-pool-3-thread-6"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Analysis-pool-3-thread-9"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Analysis-pool-3-thread-7"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Analysis-pool-3-thread-10"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Analysis-pool-3-thread-1"
Exception in thread "Analysis-pool-3-thread-4" java.lang.OutOfMemoryError: Java heap space
Environment (please complete the following information):
- OS: 22.04
- Java Version: 17.0.9
- Ghidra Version: 11.0
- Ghidra Origin: official
Additional context some info from about program
PowerPC:BE:32:QUICC (1.5)
# of bytes: 162608782
# of instructions: 27418521
# of defined data: 1203560
# of functions: 355126
# of symbols: 2109128
I was able to run the tutorial H2 script on a number of files in the project and then tried the command line bsim
with an elastic server. The project is non-shared, but can try on a shared early next week.
I'll see if I can reproduce this with a large binary. In the meantime, there is a MAXMEM
variable in the bsim
script, you can try increasing its value.
@ghidracadabra I realize OOM bug reports are hit or miss, but figured worth reporting. Thanks for the suggestion, it didn't even click to check the settings in that file. I'll try again with that increased.
I noticed the others are all along the lines of "un-comment MAXMEM if needed, otherwise its 1/4 of RAM" (16->4 vs 1 then for this setup)
Is this ingest processing the same as the plugin for H2 or could this be RAM OOM related, it's been going for like 3 hours now and the plugin was on the order of 10's of minutes. I only bumped it up to 2GB, guess I should have just commented MAXMEM.
finally killed it. 7hrs?
Just to make sure I understand the issue: using the script mentioned in the tutorial (AddProgramToH2BSimDatabaseScript.java
), you are able to generate signatures for some (all?) of these large files and ingest them into an H2 database. However, when you run the bsim
command targeting your project and an elasticsearch database it eventually fails with the OOM errors?
From the stack trace, it looks like it's running out of memory during the signature generation process. Elasticsearch vs. H2 shouldn't matter - at this point it should only have queried the database for its settings.
For debugging, you could try a modified version of your bsim command:
bsim generatesigs <ghidra_project_dir> <directory_for_signature_files> config=template
where template
is the database template you used when you created the database. This will generate the signatures without connecting to the database (you'll still be able to commit these signatures to that database since the templates match).
The bsim
command will write each signature file to the signature file directory as it progresses. This is an xml file with the name sigs_<md5_sum>
. This might give you some indication that it's making progress.
Another thing to try would be to put each firmware in its own directory in the ghidra project. In general this would be a pain, but with only five firmware images it's not so bad. You can specify a repository path in the ghidra url used in the bsim
command, so you could then use the bsim
command to generate the signatures for each file individually.
The question to answer is whether there is a specific file that is the culprit, or whether resources are not being released properly as the signature-generating processes iterates over the files in the repo.
Yes, I did mean to come back and edit that this file ran in AddProgramToH2BSimDatabaseScript.java, but forgot. I can test bsim on some of the others.
The OOM happened, but don't recall how quickly, with the 1G. With 2G it didn't OOM (at least no stack trace), but it also ran for 7hrs and I eventually Ctrl+C'd it. I was not able to jconsole to bsim, the connection attempt timed out
p.s. commenting MAXMEM doesn't work, but setting to 4G now
edit 2: looking at application.log, the timestamps for AddProgramToH2BSimDatabaseScript for completion are 30min apart.
keeping the ES setup... 4G. another file. it ran fine up until
INFO Writing signatures for sigs_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (BulkSignatures)
Exception in thread "main" Exception in thread "Log4j2-TF-3-Scheduled-1" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
6G, re-ran with that file
WARN Signature file already exists for: filename (SignatureRepository)
INFO Writing signatures for sigs_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (BulkSignatures)
6G, another file worked fine the whole way through 6G, first trouble file worked
FYI - We have bumped bsim
MAXMEM to 2GB for next release
I haven't had a chance to play with bsim with my ghidra server images (got tied up with implementing superh dsp instructions :D )
Would the images already in the server be less memory intensive or just be prepared to bump to the 4-6GB range for my images? Is this script/setup structured so differently it cant be the "uncomment this if you want to set it, otherwise quarter of RAM"?
FYI - We have bumped
bsim
MAXMEM to 2GB for next release
I think we have decided to comment-out MAXMEM value as we do for ghidraRun
launch script. In general, this should limit the process heap size to 1/4 the amount of physical memory.
It sounds like the largest memory consumers can be the following commands due to the potential number and size of function "signature" data held in memory durig the operation.
bsim commitsigs
bsim commitupdates
Closed by ad532036ab016dbf5bcb9ae6ac63b1e697aa0199