mcsema
mcsema copied to clipboard
Ghidra as CFG tool
As it was pointed out in other issues, IDA Pro is expensive, but a headless Binary Ninja is not much cheaper either.
I would hope dyninst is finally getting in here (the patching is fickle, but the CFG is fine in my experience).
But how about Ghidra? its free, open source, CFG is good, runs everywhere where a current java is available - maybe it makes sense to move completely there?
(also as I for example have IDA Pro, I have a windows license and Linux is not as stable. But I would run mcsema on the Linux side so ... sigh ...)
You can run mcsema-disass on Windows, and run mcsema-lift in WSL or Linux if you can't get it to build on Windows. It doesn't matter where the CFG comes from.
One way to run mcsema-disass directly is something like:
python ./tools/mcsema_disass/__main__.py ...
I will look into integrating Ghidra. What significant challenges can I expect to encounter?
We'll be moving to an API-based approach in the near-ish future, where the CFG files are not protobufs but something else, and the disassembler interfaces with the CFG files via an API. I'd recommend waiting until after this change is complete.
One important thing that McSema needs to know is xrefs, and how they relate to specific instruction operands. For example, if you had: mov [addr1], addr2
, both addr1
and addr2
might be xrefs, and we need to know that addr1
is a memory operand, and addr2
is an immediate operand.
You can run mcsema-disass on Windows, and run mcsema-lift in WSL or Linux if you can't get it to build on Windows. It doesn't matter where the CFG comes from.
Is there an option to export the CFG from IDA on Windows and just pass that output somehow to mcsema_disass on Linux? I would like to avoid having to setup mcsema on Windows - if possible :)
The CFG files can be produced on any OS, and used on any OS. That is, a CFG file produced by the IDA scripts on Windows will work just fine with a Linux build of mcsema-lift
.
Did anything ever come of this? Is there anything I can do to help it along?
Hi @the-wondersmith, here are our current short-to-medium term goals and how you might fit into them:
- Bring up better cross-reference and control-flow devirtualization (thunks, jump tables) support in Anvill.
- Once Anvill is able to represent the level of complexity of things needed by McSema, re-implement McSema's Function.cpp (huge file watch out) with Anvill. The goal here is that McSema lifts a whole program, Anvill lifts the functions, and Remill lift the instructions.
- Our focus in Anvill is to bring up Binary Ninja support, as IDA Pro support is relatively mature. We'll need to wire in more information in the IDA Pro scripts to enable IDA Pro to produce Anvill function specifications at the requisite level of information.
We'd be very interested in you helping bring up Ghidra support in Anvill. We think this is a good path toward us gaining Ghidra support in McSema. Let me know your thoughts. You can find me on the Empire Hacking slack #binary-lifting channel, my usernae is 'pag'.
@pgoodman To be honest, I'd actually much prefer to use BInary Ninja, as I'm primarily a Python programmer. I know that BN support was effectively stripped out of McSema a few PRs ago, I didn't realize that putting it back in was on the agenda.
I suspect I'll be much more useful in that capacity than in working with Ghidra as my Java is... beyond rusty.
In that case, the other big thing we're trying to do is to get rid of our dependency on protocol buffers and migrate to storing the data we collect into SQLite. We have a branch that's been making some progress on that front but we haven't had the time in a while to push it forward. That would be a place where we'd value help. The logical places where contributions could be made:
- Bringing the branch up-to-date with master
- Figuring out, from the Python side, what writing to an SQLite db will "look like". Ideally, we'll want to hide the usage of SQLite behind an API of some kind, just in case down the line we want to change the storage layer again.
- Figuring out how the lifting process needs to change in response to a more tabular layout of data.
@pgoodman Now that I can help with. 😁
Hi @pgoodman! I love the idea behind McSema. Unfortunately, I don’t have the luxury of owning Ida Pro. That said, I have some time, and I’d love to help on the push towards an API-based approach. Regarding your push to switch to SQLite, I noticed that the branch has been inactive for about two years. Do you still plan on moving in this direction? If so, I might set aside some time to figure this out.
BTW - sorry for necro-ing this old issue :)
So I think the next push in McSema will be a complete overhaul. That overhaul will start sometime around when some upcoming refactors in Anvill land. If you're interested in discussing this then ping me "pag" on the #binary-lifting channel of the Empire Hacking slack :-) Ghidra support would be of high value.