rellic icon indicating copy to clipboard operation
rellic copied to clipboard

Handle switch in CreateEdgeCond

Open meme opened this issue 5 years ago • 5 comments

In https://github.com/lifting-bits/rellic/blob/master/rellic/AST/GenerateAST.cpp#L135, SwitchInst are not supported.

Compile the following with remill-clang-4.0 -emit-llvm -O3 -c -o example.bc and decompile:

#include <stdint.h>

uint32_t target(uint32_t n) {
  uint32_t mod = n % 4;
  uint32_t result = 0;

  if (mod == 0) {
    result = (n | 0xbaaad0bf) * (2 ^ n);
  } else if (mod == 1) {
    result = (n & 0xbaaad0bf) * (3 + n);
  } else if (mod == 2) {
    result = (n ^ 0xbaaad0bf) * (4 | n);
  } else {
    result = (n + 0xbaaad0bf) * (5 & n);
  }

  return result;
}

You will see something similar to (instruction print was added):

F1110 21:21:03.700402 59636 GenerateAST.cpp:159] Unknown terminator instruction: switch
*** Check failure stack trace: ***
    @          0x1b4733d  google::LogMessage::Fail()
    @          0x1b49834  google::LogMessage::SendToLog()
    @          0x1b46dbb  google::LogMessage::Flush()
    @          0x1b4a459  google::LogMessageFatal::~LogMessageFatal()
    @           0x7c039d  rellic::GenerateAST::CreateEdgeCond()
SIGABRT (Abort)

Would be willing to work on this, with some guidance.

meme avatar Nov 11 '19 02:11 meme

Questioning the relevance of this -- is this tool just intended to decompile lifted bytecode (as produced by anvill), or would it be considered important to be able to lift switch statements as generated by the clang compiler? (this is making the assumption that anvill and remill do not lift optimized switch statements, but LLVM optimizer could optimize to a switch statement -- is this correct?)

meme avatar Nov 11 '19 02:11 meme

I think it is important to be able to handle as much of LLVM IR as possible. That means lifted bytecode (anvill, remill, mcsema) as well as bytecode compiled by clang. The readability of the decompiled output will of course vary wildly.

As far as switch statements in anvill or remill lifted bytecode go, I honestly don't know if there's cases where they'll appear.

Of course, any and all help is appreciated :)

surovic avatar Nov 11 '19 09:11 surovic

Switch statements in LLVM IR will definitely show up.

pgoodman avatar Dec 14 '20 06:12 pgoodman

Once we add jump table support to anvill, we will need to support switch instructions. It's also possible that they will be synthesized without our knowledge by LLVM's optimizations. McSema already produces switch instructions.

pgoodman avatar Feb 24 '21 02:02 pgoodman

This has been partially solved by #106 . Keeping this open though.

surovic avatar Mar 03 '21 14:03 surovic