cle icon indicating copy to clipboard operation
cle copied to clipboard

Issue loading binary with exceptions

Open vsoch opened this issue 2 years ago • 10 comments

Heyo! So I was curious to see how angr/cle parses exceptions so I have this library:

#include <iostream>

class DinosaurException { 

public:
    int i;
    DinosaurException() {}
    ~DinosaurException() {}
};

void throw_the_exception(bool throwit) {
    if (throwit) {
        throw DinosaurException();
    }
}

void hello_dinosaur() {
    std::cout << "Hello Dinosaur!" << std::endl;
}

void log(unsigned int count) {
    std::cout << count << std::endl;
}

void catch_the_exception() {
    log(0);
    try {
        log(1);
        hello_dinosaur();
        throw_the_exception(true);
        log(2);
    } catch (const DinosaurException& e) {        
        log(3);
    }

    // more catch statements here
    log(4);
}

int main() {
    catch_the_exception();
    return 0;
}

And then I build it:

g++ -g -c exception.cpp

And then try to load it:

import cle
path = sys.argv[1]
if not os.path.exists(path):
    sys.exit('%s does not exist' % path)
ld = cle.Loader(path, load_debug_info=True, auto_load_libs=False)

And I get the following error:

Traceback (most recent call last):
  File "dev.py", line 11, in <module>
    ld = cle.Loader(path, load_debug_info=True, auto_load_libs=False)
  File "/home/vanessa/Desktop/Code/cle/cle/loader.py", line 133, in __init__
    self.initial_load_objects = self._internal_load(main_binary, *preload_libs, *force_load_libs, preloading=(main_binary, *preload_libs))
  File "/home/vanessa/Desktop/Code/cle/cle/loader.py", line 673, in _internal_load
    obj = self._load_object_isolated(main_spec)
  File "/home/vanessa/Desktop/Code/cle/cle/loader.py", line 855, in _load_object_isolated
    result = backend_cls(binary, binary_stream, is_main_bin=self.main_object is None, loader=self, **options)
  File "/home/vanessa/Desktop/Code/cle/cle/backends/elf/elf.py", line 178, in __init__
    self._load_exception_handling(dwarf)
  File "/home/vanessa/Desktop/Code/cle/cle/backends/elf/elf.py", line 540, in _load_exception_handling
    lsda_exception_table = lsda.parse_lsda(entry.lsda_pointer, file_offset)
  File "/home/vanessa/Desktop/Code/cle/cle/backends/elf/lsda.py", line 97, in parse_lsda
    header = self._parse_lsda_header()
  File "/home/vanessa/Desktop/Code/cle/cle/backends/elf/lsda.py", line 127, in _parse_lsda_header
    raise NotImplementedError("Unsupported modifier %#x." % modifier)
NotImplementedError: Unsupported modifier 0xf0.

I'm guessing there is something specific about my example program (e.g., the "unsupported modifier" I see above) and for the time being I'll try to inspect exceptions in another program! But I'm wondering what the specific issue is above and if I can fix it?

vsoch avatar Feb 25 '22 20:02 vsoch

Indeed I'm not sure what the flag is!

{'DW_EH_PE_absptr': 0,
 'DW_EH_PE_uleb128': 1,
 'DW_EH_PE_udata2': 2,
 'DW_EH_PE_udata4': 3,
 'DW_EH_PE_udata8': 4,
 'DW_EH_PE_signed': 8,
 'DW_EH_PE_sleb128': 9,
 'DW_EH_PE_sdata2': 10,
 'DW_EH_PE_sdata4': 11,
 'DW_EH_PE_sdata8': 12,
 'DW_EH_PE_pcrel': 16,
 'DW_EH_PE_textrel': 32,
 'DW_EH_PE_datarel': 48,
 'DW_EH_PE_funcrel': 64,
 'DW_EH_PE_aligned': 80,
 'DW_EH_PE_indirect': 128,
 'DW_EH_PE_omit': 255}

derived from:

DW_EH_encoding_flags = dict(
    DW_EH_PE_absptr=0x00,
    DW_EH_PE_uleb128=0x01,
    DW_EH_PE_udata2=0x02,
    DW_EH_PE_udata4=0x03,
    DW_EH_PE_udata8=0x04,
    DW_EH_PE_signed=0x08,
    DW_EH_PE_sleb128=0x09,
    DW_EH_PE_sdata2=0x0A,
    DW_EH_PE_sdata4=0x0B,
    DW_EH_PE_sdata8=0x0C,
    DW_EH_PE_pcrel=0x10,
    DW_EH_PE_textrel=0x20,
    DW_EH_PE_datarel=0x30,
    DW_EH_PE_funcrel=0x40,
    DW_EH_PE_aligned=0x50,
    DW_EH_PE_indirect=0x80,
    DW_EH_PE_omit=0xFF,
)

This is a simpler example that also triggers the error:

#include <iostream>
using namespace std;

double division(int a, int b) {
   if( b == 0 ) {
      throw "Division by zero condition!";
   }
   return (a/b);
}

int main () {
   int x = 50;
   int y = 0;
   double z = 0;
 
   try {
      z = division(x, y);
      cout << z << endl;
   } catch (const char* msg) {
     cerr << msg << endl;
   }

   return 0;
}

I have g++ 9.3.0 (ubuntu default) could be a little old? LMK if I should try a different compiler!

vsoch avatar Feb 25 '22 20:02 vsoch

This issue has been marked as stale because it has no recent activity. Please comment or add the pinned tag to prevent this issue from being closed.

github-actions[bot] avatar May 18 '22 02:05 github-actions[bot]

Are y'all still interested in this? i have a branch with a bunch of work on it, I could just share if that's helpful.

vsoch avatar May 18 '22 02:05 vsoch

Yes! I'm not the person who knows my way around the CLE exception parsing, that would be @ltfish, but this is definitely a bug we want fixed. Please dump whatever you've got and I'll add the help wanted label if fish doesn't get around to it :)

rhelmot avatar May 18 '22 02:05 rhelmot

Awesome! And I can probably come back and work on it with a little guidance. Here is:

  • the branch: https://github.com/vsoch/cle/tree/add/dwarf-corpus-march
  • diff: https://github.com/angr/cle/compare/master...vsoch:add/dwarf-corpus-march

And I've actually done more dwarf parsing on another project in case I need to come back here (e.g., for more DIE types!) And how I was running it - created some test programs and then ran python dev.py <program>

#!/usr/bin/env python3

import os
import sys
# TODO we will want to look at lib name if they auto load and not add to corpus
# OR we will want to generate separate corpora
import cle
path = sys.argv[1]
if not os.path.exists(path):
    sys.exit('%s does not exist' % path)
ld = cle.Loader(path, load_debug_info=True, auto_load_libs=True)
print(ld.corpus.to_json())

vsoch avatar May 18 '22 02:05 vsoch

hey! So I'm coming back to work on this - going to try and implement the option to parse from location lists AND based on the type specification in the x86 ABI (of course only if the arch matches).

Question - what tests would you want to see if I PR these changes? Right now I'm generating corpus json output and just checking it closely (and those could become the test set and a new example added if/when a case is needed).

vsoch avatar Jun 04 '22 04:06 vsoch

we generally like tests that are as end-to-end as possible around here, so if you have some notion of a set of "in the wild" inputs and a way to validate the outputs of processing those inputs, that would be totally acceptable.

rhelmot avatar Jun 04 '22 05:06 rhelmot

Hey wanted to check in again! So for all the subfolders here that don't start with underscore and have a facts.json, this is the format we are looking toward for an output. I originally didn't have the types lookup but it would take up too much memory parsing the same types and adding them so I added it to account for that.

https://github.com/vsoch/cle/tree/add/x86-parser-june/examples

Does any of that look like something you'd eventually want for cle? I ask because if not, I can move it into its own library instead. But I'll keep working within cle if it's still of interest!

vsoch avatar Jun 13 '22 21:06 vsoch

This issue has been marked as stale because it has no recent activity. Please comment or add the pinned tag to prevent this issue from being closed.

github-actions[bot] avatar Aug 13 '22 02:08 github-actions[bot]

This is now (mostly done) at https://github.com/vsoch/cle (note I'm using a "main" for the main branch instead of master). It might not be what is desired here, but I've tried to stay up to date with angr master so it's something we could consider, if there is interest. If not, feel free to close the issue!

vsoch avatar Aug 13 '22 02:08 vsoch

There is still interest, but again, the only person who actually knows his way around the exception parsing code is @ltfish. I would be willing to merge something based solely on testcases looking sound, but I can't seem to validate that on sight based on the branch you've linked. If you could submit a minimal PR (i.e. 1-3 testcases that are evaluated solely on the ability of CLE to parse exceptions, I would gladly review it.

rhelmot avatar Aug 22 '22 17:08 rhelmot

The current design is that the test cases are in a different respository - do you want them added to angr proper to allow that?

vsoch avatar Aug 22 '22 18:08 vsoch

Generally our design is that we put all our testcase binaries and data files in https://github.com/angr/binaries/ and put the py tests themselves in the tests directory of the appropriate repository.

rhelmot avatar Aug 22 '22 18:08 rhelmot

Gotcha - so just to clarify - you want the actual binaries (post-compile) added there directly (and not some build process that compiles them?). Can you show me your preference for where in that structure? And once I have them there, can you show me an example in angr where the repository is obtained and the tests run? I can try to mimic that structure!

vsoch avatar Aug 22 '22 18:08 vsoch

Yes - since we're hardcoding properties about the binaries themselves in our testcases (occasionally to the level of individual instruction addresses), we can't rely on compilers to not change this stuff out from under us!

The expectation is that the binaries repository is cloned into the same directory that cle or angr or whatever is. We have the angr-dev repository to set up this structure. The testcases manually reach outside their repository to get to these files. Here is a good example of a testcase using that model.

rhelmot avatar Aug 22 '22 18:08 rhelmot

okay cool! I'm putting together the pieces. So I have:

  1. a branch of a fork of binaries with my test cases
  2. my branch of cle with a custom tests.py file that targets the files I have in binaries (assuming the same path/structure)

Remaining questions - how do I PR a branch I have for angr/cle to test in a way to use my fork of binaries?

vsoch avatar Aug 22 '22 19:08 vsoch

Looks like requirements come from here, so if I need deepdiff to compare json structures will need to be added here (although I can verify this when it first tries and fails) https://github.com/angr/ci-settings/blob/master/ci-image/conf/requirements.txt.template

vsoch avatar Aug 22 '22 19:08 vsoch

how do I PR a branch I have for angr/cle to test in a way to use my fork of binaries?

Open a pull request for each repository and mention each pull request in the other's description. The CI will see this reference and pull from that branch while building.

if I need deepdiff to compare json structures

This is actually an interesting point - I believe the right thing to do for now is to just add deepdiff as a hard dependency of CLE. In the future, we should have it be an optional dependency or a development in CLE's setup.cfg and install all optional/dev dependencies during CI.

rhelmot avatar Aug 22 '22 19:08 rhelmot