ghidra
ghidra copied to clipboard
Feature Request : Solution for ARM/THUMB problems in Machine Learning Extension
Is your feature request related to a problem? Please describe. On ARM binaries, functions can be in either ARM or THUMB mode. While a model trained on a set of functions containing both will be able to identify likely functions, mass-disassembling does not work well and errors will (probably) arise due to disassembling a THUMB candidate as ARM.
Describe the solution you'd like Either of these would work i.) Ability to filter on functions trained, coupled with serialization/de-serialization of a model. Being able to save the model persistently would also be a useful feature itself. ii.) Specific changes to the plugin to deal with ARM binaries e.g. ARM model/THUMB model
Describe alternatives you've considered
- Make a temporary copy of your program and delete all the ARM/THUMB functions, train on the other mode, then apply to original program.
- Script to test candidates reported by the model.
I only just noticed the ML extension, so apologies if there already is some way to do all this.
I just noticed there's this option

and I assume "TMode=0" or "TMode=1" is the intended usage here. Would still like to know if (I assume not) the model can be stored persistently though - should I recycle this issue or close it for a new one?
That is indeed what the Context Registers and Values option is for. At the moment it's the user's responsibility to enter the registers and values they care about before training the model. It might make sense to present the relevant context registers for a given architecture automatically.
At the moment there isn't a way to store a model persistently. This is probably worth a new ticket. The ML extension is new; we might wait a bit to see whether there are other requests before making largish changes. Once things can be saved we have to start considering backward compatibility when making further changes...
Closing as those who use it for ARM/THUMB will know to set the registers. I'll make another ticket for persistent storage down the line once more people have had their fun.