fcd icon indicating copy to clipboard operation
fcd copied to clipboard

Misc fixes

Open nickolas-pohilets opened this issue 6 years ago • 24 comments

This MR contains assorted bug fixes, bringing fcd+remill to a point where it is able to successfully decompile a simple function using debug build of LLVM-7.0 and latest version of Remill.

LLVM versions pre 7.0 have a bug where removal of the dereferencable attributes triggers an assertion in debug build and reads memory out of bounds in release build, fixed in 9bc0b1080f195636fed019bce979aa72892d6c69.

nickolas-pohilets avatar Jan 03 '19 21:01 nickolas-pohilets

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jan 03 '19 21:01 CLAassistant

First of all, thanks a lot @nickolas-pohilets for the PR! I added some comments, let me know if you have any questions or comments to my comments :)

Also I'd like to let you know that there's a lot of work done on the AST part of fcd in the dev-clang-ast branch of the repo. It's basically a complete rewrite of the AST backend so I personally would not spend too much time on the old AST backend in fcd. The new backend is not at parity with the old one yet though.

surovic avatar Jan 04 '19 14:01 surovic

Is there a plan to sync back with zneak/fcd? Should project use coding style of the original project or conform with one of trailofbits?

nickolas-pohilets avatar Jan 04 '19 23:01 nickolas-pohilets

Should project use coding style of the original project or conform with one of trailofbits?

I personally use the trailofbits coding standards in all files with a trailofbits license header. In original fcd code I try to adhere to the original coding style. Generally as a rule of thumb -- make your code blend in with the surrounding code.

Is there a plan to sync back with zneak/fcd?

@pgoodman what do you think? I'm inclined to say no, since the divergence is pretty significant and I think that overall priorities have shifted away from fcd as a tool. The functionality will be covered and expanded upon by other tools (McSema with the DynInst frontend) and a tool based on the new AST backend from fcd.

@nickolas-pohilets what are your use-cases for fcd? We could take them into consideration when developing the other tools.

surovic avatar Jan 05 '19 00:01 surovic

@nickolas-pohilets There is no plan to do that. The original scope of the project was to port fcd to use Remill because the lifting approaches shared many similarities. Since then, the scope and direction of the project has changed somewhat.

pgoodman avatar Jan 05 '19 00:01 pgoodman

@surovic Returning a container type is usually acceptable because of RVO.

pgoodman avatar Jan 05 '19 00:01 pgoodman

I want to make a decompiler for Objective-C based on fcd+remill. Compiled Objective-C binaries contain lots of meta-information. As a reference, class-dump can generate headers based on that. Decompiling Objective-C into relatively high-quality source code seems to be a pretty low-hanging fruit.

Few things from the top of my head what will be need:

  • Wider interface of the executable to get ObjC metadata. Probably returning a section by name. Using llvm::object::ObjectFile might be a good idea.
  • Special signature recovery for calls of objc_msgSend.
  • AST able to represent Objective-C features. The one from Clang should do the job.
  • Lots of work in the backend to reconstruct the AST.

nickolas-pohilets avatar Jan 05 '19 21:01 nickolas-pohilets

What is the current status of support of LLVM 7.0 in Remill and McSema? I've quickly checked travis scripts in Remill, McSema and cxx-common, and 7.0 version is not mentioned anywhere.

nickolas-pohilets avatar Jan 06 '19 15:01 nickolas-pohilets

@nickolas-pohilets I think it's a matter of building cxx-common for LLVM 7.0. The latest issues we've been having is actually more to do with RTTI, so that we can support DynInst as a frontend.

pgoodman avatar Jan 07 '19 17:01 pgoodman

So, what is then plan to get these changes merged?

nickolas-pohilets avatar Jan 09 '19 10:01 nickolas-pohilets

Sorry, closing by mistake.

nickolas-pohilets avatar Jan 09 '19 10:01 nickolas-pohilets

Just about to check out the PR, see if it builds on my machine and merge if they do. Stay tuned!

surovic avatar Jan 09 '19 10:01 surovic

Well, unfortunately there's a problem with building against LLVM-4.0 namely due to a7d6c63 using llvm::Value::deleteValue() which does not exist prior to LLVM-5.0.

@nickolas-pohilets do you think you can fix this? @pgoodman are we going to keep full LLVM-3.5 to LLVM-7.0 compatibility?

Edit: Another incompatibility is in 8e04ec1, since llvm/Transforms/Utils.h does not exist prior to LLVM-7.0. Take a look fcd/compat/Scalar.h for a hint at how we resolve header compatibility issues like these.

surovic avatar Jan 09 '19 11:01 surovic

I propose to drop pre-LLVM 7.0, if possible. As I mentioned in the PR description, there is a bug that got fixed only in LLVM 7.0

nickolas-pohilets avatar Jan 09 '19 12:01 nickolas-pohilets

I'm going to put this on hold until we hear from @pgoodman

surovic avatar Jan 09 '19 12:01 surovic

I think I can re-write the code to avoid buggy function to make it work with older versions, but I’m not sure if it is worth the effort. In contrast to Remill, FCD currently does not have any clients to keep compatibility with.

nickolas-pohilets avatar Jan 09 '19 12:01 nickolas-pohilets

To be honest, I think the fork is currently maintained only because some code might be useful in other projects, like McSema. FCD doing it's own CFG recovery is more of a hindrance than a feature, since lots of other tools provide this functionality and there's only so much developer time available for FCD. IMO the real value of FCD, considering McSema covers LLVM IR generation, is in it's C AST backend.

surovic avatar Jan 09 '19 12:01 surovic

Agree. Shall be we then go one step further and turn FCD+Remill into FCD+McSema+Some CFG Recovery?

nickolas-pohilets avatar Jan 09 '19 14:01 nickolas-pohilets

Ideally I want compat with older versions of LLVM. The way I tend to handle that is to add code into https://github.com/trailofbits/remill/tree/master/remill/BC/Compat

pgoodman avatar Jan 09 '19 15:01 pgoodman

      auto inst = expression->getAsInstruction();
      auto res = ctx.uncachedExpressionFor(*inst);
      inst->deleteValue();
      return res;

@nickolas-pohilets Can you deal with the leak by attaching inst to a basic block somewhere, then invoke inst->eraseFromParent()?

pgoodman avatar Jan 09 '19 15:01 pgoodman

The step has already been more or less made :slightly_smiling_face:. Rellic is the clang-based C AST backend that lived in FCD's dev-clang-ast branch. It was private for a while, but we figured there's no reason not to make it public at this point.

So the toolchain we are currently looking at is something like McSema+Rellic, where CFG recovery is done by one of McSema's frontends. The main McSema frontend is IDA Pro currently, but there is development being done on a Binary Ninja and the DynInst frontend.

surovic avatar Jan 09 '19 15:01 surovic

Then, probably, I should already abandon FCD altogether and start hacking on Rellic. Shall we organise a (video) call some time next week to discuss collaboration? Pls pm me.

nickolas-pohilets avatar Jan 09 '19 16:01 nickolas-pohilets

Sure. What is your username on EH slack?

pgoodman avatar Jan 09 '19 16:01 pgoodman

@Mykola Pokhylets

nickolas-pohilets avatar Jan 09 '19 16:01 nickolas-pohilets