binrec-tob icon indicating copy to clipboard operation
binrec-tob copied to clipboard

Inconsistent Trace Info on Some Samples with Concrete Inputs

Open ameily opened this issue 3 years ago • 15 comments

With PR trailofbits/binrec-prerelease#169, I'm seeing that several binrec samples can lift to a recovered binary that has inconsistent and incorrect behavior sometimes. These samples will lift but will fail the verification step in the integration tests. I've seen it happen on multiple samples:

  • breakFallback
  • eq
  • consecutive_calls

For eq, I'm seeing that the broken recovered binary has differences in traceInfo.json:

comparing s2e/projects/eq-working/s2e-out/traceInfo.json (-) against
          s2e/projects/eq-broken/s2e-out/traceInfo.json (+)

found 5 differences:
- functionLog.entryToReturn ('0x8049196', '0x80491f3')
- functionLog.entryToTbs ('0x8049070', ('0x8049030', '0x8049050', '0x8049070'))
- functionLog.entryToTbs ('0x8049196', ('0x8049196', '0x80491b2', '0x80491d4', '0x80491e1', '0x80491e6', '0x80491f3', '0x80491f6'))
+ functionLog.entryToTbs ('0x8049070', ('0x8049030', '0x8049050', '0x8049070', '0x8049196', '0x80491b2', '0x80491e6', '0x80491f3', '0x8049210', '0x804921a', '0x8049231', '0x8049244', '0x8049263', '0x804926d'))
+ functionLog.entryToTbs ('0x8049196', ('0x8049196', '0x80491b2', '0x80491d4', '0x80491e1', '0x80491f6'))

Here are zip files of the s2e-out directory for both a broken and working recovered binary:

ameily avatar Apr 20 '22 15:04 ameily

I briefly started looking into this. Nothing jumped out immediately that would cause this difference some of the time and I confirmed that it's happening in the case where the two inputs are not equal. The trace infos for the inputs being equal were identical.

It looks as if the control flow is slightly wrong because there are two entires in functionLog.entryToTbs that are in the wrong list: 0x80491e6, 0x80491f3. These both should be in the 0x8049196 item but are instead in 0x8049070.

ameily avatar Apr 20 '22 20:04 ameily

I've added more logging to the FunctionLog plugin and it appears that some calls are not reaching the plugin when an incorrect trace is produced.

#### These 2 calls are caught in both traces ####
9 [State 0] [FunctionLog] ignoring call: callee:0x11768, caller:0x10155
9 [State 0] [FunctionLog] ignoring call: callee:0x18d0, caller:0x1176c
#### The following are missing from 1 trace ####
9 [State 0] [FunctionLog] ignoring call: callee:0x6c650, caller:0x2e750
9 [State 0] [FunctionLog] ignoring call: callee:0x121375, caller:0x6c654
9 [State 0] [FunctionLog] ignoring call: callee:0x6c350, caller:0x6c664
9 [State 0] [FunctionLog] ignoring call: callee:0x121381, caller:0x6c354
#### This function is caught in both traces ####
9 [State 0] [FunctionLog] ignoring call: callee:0x6abc0, caller:0x6c4ba

In the FunctionLog code, I don't see anywhere that would cause this behavior. The plugin uses the FunctionMonitor plugin and hooks the onCall signal.

At first glance, this looks to be a bug in S2E because I don't see how binrec could not be listening to these events consistently.

ameily avatar Apr 27 '22 00:04 ameily

Digging into this further, I initially thought calls were being missed prior to the module load signal being triggered by S2E. So I tried moving the hook on function call earlier and then check if the plugin is initialized when a call actually occurs. This does not appear to be the case and I'm not seeing calls being missed prior to the module being registered in S2E.

I found that the ExportElf plugin is correctly receiving a signal when control enters back into the module of interest. On a working trace, I see that a call happens immediately prior to ExportElf exporting the new block. On a broken trace, I'm seeing that there is no call prior to ExportElf exporting the block:

#### this call is missing from a broken trace ####
8 [State 0] [FunctionLog] detected call: callee:0x8049196, caller:0x18282
#### both traces export the BB ####
8 [State 0] [ExportELF] Export block 0x8049196.

Looking at S2E, this bug may be related to a fix in how TCG helpers are registered (commit). So, I'm going to try and run binrec against S2E master.

This inconsistency is happening pretty rarely and you won't see it on some integration tests because the calls will be missing in one trace but exist in another, so they are merged together properly. So, if we were running into this prior, we may not have experienced it because the we were running multiple traces and merging them together to exercise the recovered binary fully.

ameily avatar Apr 27 '22 13:04 ameily

Upgrading S2E to master did not fix the issue. I did a fresh clone, verified that the latest S2E was pulled, and still encountered the issue.

$ git -C ./s2e/source/s2e log -n1
commit d86d3b9a430153f056665de484779d5bb65da0fb (HEAD, s2e/master, m/master)
Author: Vitaly Chipounov <[email protected]>
Date:   Wed Apr 20 22:29:36 2022 +0200

    libs2e: use z3.a and gomp
    
    - .a ensures that it links against the static version of z3
    - gomp is needed by z3
    
    Signed-off-by: Vitaly Chipounov <[email protected]>

ameily avatar Apr 27 '22 16:04 ameily

I've verified that the bootstrap.sh and s2e-config.lua files are the same between a working and broken trace. I was thinking that the plugins were potentially being loaded in an incorrect order.

ameily avatar Apr 27 '22 17:04 ameily

I added logging to FunctionMonitor::onTranslateBlockEnd(), which hooks the function call signal. I am not seeing this function get hit when a call is missed. So, the problem could be further up in S2E, where the onTranslateBlockEnd signal is not being triggered properly.

ameily avatar Apr 27 '22 19:04 ameily

My thought was that the sample may be exiting early on bad traces, which would explain the missing calls. So, I removed the > /dev/null 2>&1 in the bootstrap.sh script so that stdout/stderr is captured in the trace's serial.txt file. I verified that in both working and broken traces the sample appears to execute fully.

ameily avatar Apr 28 '22 00:04 ameily

Looking at the control flow addresses that are missing in a broken trace:

  • functionLog.callerToFollowUp (0x18282, 0x8049160)
  • functionLog.callerToFollowUp (0x18212, 0x8049196)
  • functionLog.entryToCaller (0x8049210, 0x18212)
  • functionLog.entryToCaller (0x8049196, 0x18282)
  • functionLog.entryToReturn (0x8049210, 0x804926d)
  • functionLog.entryToReturn (0x8049196, 0x80491f3)
Address Code
0x18212 ?
0x18282 ?
0x8049160 __do_global_dtors_aux (entry)
0x8049196 main (entry)
0x80491f3 main (after call to puts)
0x8049210 __libc_csu_init (entry)
0x804926d __libc_csu_init (after a jne branch)

ameily avatar May 05 '22 14:05 ameily

I am seeing that differences between runs are not consistent. I am seeing the same behavior as trailofbits/binrec-tob#33 when the main function call is being missed, which leads me to believe that both issues are related or they are the same issue. If so, this problem has been occurring since the S2E upgrade and could be the cause of multiple inconsistent/random lifting issues we've seen.

ameily avatar May 06 '22 13:05 ameily

@michaelbrownuc - to summarize this issue so far:

  • I'm seeing inconsistent trace info for several samples. I've been primarily targeting eq, in the case where the two inputs are not equal, which triggers the inconsistent trace info reliably
  • It appears that the S2E FunctionMonitor plugin is not triggering properly when a run produces incomplete trace info, which in turn doesn't trigger the binrec FunctionLog plugin.
  • The basic blocks where the function calls are performed are still exported, but the control flow info to them is not in the trace info. So, the module execute signal is being sent by S2E properly, but not the FunctionMonitor::onTranslateBlockEnd() signal.
  • From what I can tell, it appears that the problem is not in our FunctionLog plugin.
  • I verified that the samples are being executed fully within the analysis VM and the code is being recorded similarly between a correct and incorrect run.
  • I think this is related to other intermittent lifting issues we've seen (can't find main, bad successor info, etc) which means this has been a problem since we updated S2E, and maybe even before that.
  • The timeline of signals that trigger a function call are:
    1. OSMonitor::onMonitorLoad - fires once during setup
    2. FunctionMonitor::onMonitorLoad - registers a handler for the onTranslateBlockEnd signal from the core S2E plugin
    3. FunctionMonitor::onTranslateBlockEnd - checks if the current instruction is a call and, if it is, registers onFunctionCall handler:
    4. FunctionMonitor::onFunctionCall
      • Verifies that the call is occurring in the process being monitored
      • Normalizes addresses
      • Triggers a onCall signal
    5. FunctionLog::onFunctionCall - verifies that the call is calling from the process being monitored or calling out of the process being monitored (ignored intra-library calls) and registers a handler for the function return.
  • As far as I can tell, this all works as expected and follows S2E docs correctly. The problem appears to be that the FunctionMonitor::onTranslateBlockEnd handler is not called sometimes when incorrect trace info is produced.

This isn't an emergency and I have other tasking I can work on. However, I think I'd like a quick review from you to make sure I'm not missing anything before I do a deep dive on S2E to see if the problem is there.

ameily avatar May 06 '22 13:05 ameily

This could be a side-effect of running multiple samples back-to-back. I'm going to be testing:

  • Does this occur with a single manual run?
  • Does adding a sleep between runs / commands fix the issue?

ameily avatar May 09 '22 18:05 ameily

Does this occur with a single manual run?

Yes. I just confirmed that this occurs when running a single trace.

Does adding a sleep between runs / commands fix the issue?

This doesn't apply since it's occurring on a single run.

ameily avatar May 10 '22 12:05 ameily

Based on the findings so far, the plan is to switch assertion failures to exceptions and re-run the trace on exception, trailofbits/binrec-prerelease#171

ameily avatar May 10 '22 17:05 ameily

Not working on this issue for the time being as the issue appears to be internal to S2E.

michaelbrownuc avatar Jun 02 '22 20:06 michaelbrownuc

I'm also seeing this issue fairly consistently on the server sample.

ameily avatar Jul 07 '22 16:07 ameily