mcsema Ghidra as CFG tool

As it was pointed out in other issues, IDA Pro is expensive, but a headless Binary Ninja is not much cheaper either.

I would hope dyninst is finally getting in here (the patching is fickle, but the CFG is fine in my experience).

But how about Ghidra? its free, open source, CFG is good, runs everywhere where a current java is available - maybe it makes sense to move completely there?

(also as I for example have IDA Pro, I have a windows license and Linux is not as stable. But I would run mcsema on the Linux side so ... sigh ...)

Mar 21 '19 20:03 vanhauser-thc

You can run mcsema-disass on Windows, and run mcsema-lift in WSL or Linux if you can't get it to build on Windows. It doesn't matter where the CFG comes from.

Apr 05 '19 22:04 pgoodman

One way to run mcsema-disass directly is something like:

python ./tools/mcsema_disass/__main__.py ...

Apr 05 '19 22:04 pgoodman

I will look into integrating Ghidra. What significant challenges can I expect to encounter?

Sep 19 '19 16:09 jpc0016

We'll be moving to an API-based approach in the near-ish future, where the CFG files are not protobufs but something else, and the disassembler interfaces with the CFG files via an API. I'd recommend waiting until after this change is complete.

One important thing that McSema needs to know is xrefs, and how they relate to specific instruction operands. For example, if you had: mov [addr1], addr2, both addr1 and addr2 might be xrefs, and we need to know that addr1 is a memory operand, and addr2 is an immediate operand.

Sep 19 '19 17:09 pgoodman

You can run mcsema-disass on Windows, and run mcsema-lift in WSL or Linux if you can't get it to build on Windows. It doesn't matter where the CFG comes from.

Is there an option to export the CFG from IDA on Windows and just pass that output somehow to mcsema_disass on Linux? I would like to avoid having to setup mcsema on Windows - if possible :)

Dec 27 '19 11:12 vanhauser-thc

The CFG files can be produced on any OS, and used on any OS. That is, a CFG file produced by the IDA scripts on Windows will work just fine with a Linux build of mcsema-lift.

Jan 27 '20 21:01 pgoodman

Did anything ever come of this? Is there anything I can do to help it along?

Nov 11 '20 02:11 the-wondersmith

Hi @the-wondersmith, here are our current short-to-medium term goals and how you might fit into them:

Bring up better cross-reference and control-flow devirtualization (thunks, jump tables) support in Anvill.
Once Anvill is able to represent the level of complexity of things needed by McSema, re-implement McSema's Function.cpp (huge file watch out) with Anvill. The goal here is that McSema lifts a whole program, Anvill lifts the functions, and Remill lift the instructions.
Our focus in Anvill is to bring up Binary Ninja support, as IDA Pro support is relatively mature. We'll need to wire in more information in the IDA Pro scripts to enable IDA Pro to produce Anvill function specifications at the requisite level of information.

We'd be very interested in you helping bring up Ghidra support in Anvill. We think this is a good path toward us gaining Ghidra support in McSema. Let me know your thoughts. You can find me on the Empire Hacking slack #binary-lifting channel, my usernae is 'pag'.

Nov 11 '20 06:11 pgoodman

@pgoodman To be honest, I'd actually much prefer to use BInary Ninja, as I'm primarily a Python programmer. I know that BN support was effectively stripped out of McSema a few PRs ago, I didn't realize that putting it back in was on the agenda.

I suspect I'll be much more useful in that capacity than in working with Ghidra as my Java is... beyond rusty.

Nov 11 '20 20:11 the-wondersmith

In that case, the other big thing we're trying to do is to get rid of our dependency on protocol buffers and migrate to storing the data we collect into SQLite. We have a branch that's been making some progress on that front but we haven't had the time in a while to push it forward. That would be a place where we'd value help. The logical places where contributions could be made:

Bringing the branch up-to-date with master
Figuring out, from the Python side, what writing to an SQLite db will "look like". Ideally, we'll want to hide the usage of SQLite behind an API of some kind, just in case down the line we want to change the storage layer again.
Figuring out how the lifting process needs to change in response to a more tabular layout of data.

Nov 11 '20 20:11 pgoodman

@pgoodman Now that I can help with. 😁

Nov 11 '20 20:11 the-wondersmith

Hi @pgoodman! I love the idea behind McSema. Unfortunately, I don’t have the luxury of owning Ida Pro. That said, I have some time, and I’d love to help on the push towards an API-based approach. Regarding your push to switch to SQLite, I noticed that the branch has been inactive for about two years. Do you still plan on moving in this direction? If so, I might set aside some time to figure this out.

BTW - sorry for necro-ing this old issue :)

Jul 07 '22 11:07 cernec1999

So I think the next push in McSema will be a complete overhaul. That overhaul will start sometime around when some upcoming refactors in Anvill land. If you're interested in discussing this then ping me "pag" on the #binary-lifting channel of the Empire Hacking slack :-) Ghidra support would be of high value.

Jul 07 '22 13:07 pgoodman

mcsema mcsema copied to clipboard

Ghidra as CFG tool

mcsema
mcsema copied to clipboard