cle
cle copied to clipboard
Issue loading binary with exceptions
Heyo! So I was curious to see how angr/cle parses exceptions so I have this library:
#include <iostream>
class DinosaurException {
public:
int i;
DinosaurException() {}
~DinosaurException() {}
};
void throw_the_exception(bool throwit) {
if (throwit) {
throw DinosaurException();
}
}
void hello_dinosaur() {
std::cout << "Hello Dinosaur!" << std::endl;
}
void log(unsigned int count) {
std::cout << count << std::endl;
}
void catch_the_exception() {
log(0);
try {
log(1);
hello_dinosaur();
throw_the_exception(true);
log(2);
} catch (const DinosaurException& e) {
log(3);
}
// more catch statements here
log(4);
}
int main() {
catch_the_exception();
return 0;
}
And then I build it:
g++ -g -c exception.cpp
And then try to load it:
import cle
path = sys.argv[1]
if not os.path.exists(path):
sys.exit('%s does not exist' % path)
ld = cle.Loader(path, load_debug_info=True, auto_load_libs=False)
And I get the following error:
Traceback (most recent call last):
File "dev.py", line 11, in <module>
ld = cle.Loader(path, load_debug_info=True, auto_load_libs=False)
File "/home/vanessa/Desktop/Code/cle/cle/loader.py", line 133, in __init__
self.initial_load_objects = self._internal_load(main_binary, *preload_libs, *force_load_libs, preloading=(main_binary, *preload_libs))
File "/home/vanessa/Desktop/Code/cle/cle/loader.py", line 673, in _internal_load
obj = self._load_object_isolated(main_spec)
File "/home/vanessa/Desktop/Code/cle/cle/loader.py", line 855, in _load_object_isolated
result = backend_cls(binary, binary_stream, is_main_bin=self.main_object is None, loader=self, **options)
File "/home/vanessa/Desktop/Code/cle/cle/backends/elf/elf.py", line 178, in __init__
self._load_exception_handling(dwarf)
File "/home/vanessa/Desktop/Code/cle/cle/backends/elf/elf.py", line 540, in _load_exception_handling
lsda_exception_table = lsda.parse_lsda(entry.lsda_pointer, file_offset)
File "/home/vanessa/Desktop/Code/cle/cle/backends/elf/lsda.py", line 97, in parse_lsda
header = self._parse_lsda_header()
File "/home/vanessa/Desktop/Code/cle/cle/backends/elf/lsda.py", line 127, in _parse_lsda_header
raise NotImplementedError("Unsupported modifier %#x." % modifier)
NotImplementedError: Unsupported modifier 0xf0.
I'm guessing there is something specific about my example program (e.g., the "unsupported modifier" I see above) and for the time being I'll try to inspect exceptions in another program! But I'm wondering what the specific issue is above and if I can fix it?
Indeed I'm not sure what the flag is!
{'DW_EH_PE_absptr': 0,
'DW_EH_PE_uleb128': 1,
'DW_EH_PE_udata2': 2,
'DW_EH_PE_udata4': 3,
'DW_EH_PE_udata8': 4,
'DW_EH_PE_signed': 8,
'DW_EH_PE_sleb128': 9,
'DW_EH_PE_sdata2': 10,
'DW_EH_PE_sdata4': 11,
'DW_EH_PE_sdata8': 12,
'DW_EH_PE_pcrel': 16,
'DW_EH_PE_textrel': 32,
'DW_EH_PE_datarel': 48,
'DW_EH_PE_funcrel': 64,
'DW_EH_PE_aligned': 80,
'DW_EH_PE_indirect': 128,
'DW_EH_PE_omit': 255}
derived from:
DW_EH_encoding_flags = dict(
DW_EH_PE_absptr=0x00,
DW_EH_PE_uleb128=0x01,
DW_EH_PE_udata2=0x02,
DW_EH_PE_udata4=0x03,
DW_EH_PE_udata8=0x04,
DW_EH_PE_signed=0x08,
DW_EH_PE_sleb128=0x09,
DW_EH_PE_sdata2=0x0A,
DW_EH_PE_sdata4=0x0B,
DW_EH_PE_sdata8=0x0C,
DW_EH_PE_pcrel=0x10,
DW_EH_PE_textrel=0x20,
DW_EH_PE_datarel=0x30,
DW_EH_PE_funcrel=0x40,
DW_EH_PE_aligned=0x50,
DW_EH_PE_indirect=0x80,
DW_EH_PE_omit=0xFF,
)
This is a simpler example that also triggers the error:
#include <iostream>
using namespace std;
double division(int a, int b) {
if( b == 0 ) {
throw "Division by zero condition!";
}
return (a/b);
}
int main () {
int x = 50;
int y = 0;
double z = 0;
try {
z = division(x, y);
cout << z << endl;
} catch (const char* msg) {
cerr << msg << endl;
}
return 0;
}
I have g++ 9.3.0 (ubuntu default) could be a little old? LMK if I should try a different compiler!
This issue has been marked as stale
because it has no recent activity. Please comment or add the pinned
tag to prevent this issue from being closed.
Are y'all still interested in this? i have a branch with a bunch of work on it, I could just share if that's helpful.
Yes! I'm not the person who knows my way around the CLE exception parsing, that would be @ltfish, but this is definitely a bug we want fixed. Please dump whatever you've got and I'll add the help wanted label if fish doesn't get around to it :)
Awesome! And I can probably come back and work on it with a little guidance. Here is:
- the branch: https://github.com/vsoch/cle/tree/add/dwarf-corpus-march
- diff: https://github.com/angr/cle/compare/master...vsoch:add/dwarf-corpus-march
And I've actually done more dwarf parsing on another project in case I need to come back here (e.g., for more DIE types!) And how I was running it - created some test programs and then ran python dev.py <program>
#!/usr/bin/env python3
import os
import sys
# TODO we will want to look at lib name if they auto load and not add to corpus
# OR we will want to generate separate corpora
import cle
path = sys.argv[1]
if not os.path.exists(path):
sys.exit('%s does not exist' % path)
ld = cle.Loader(path, load_debug_info=True, auto_load_libs=True)
print(ld.corpus.to_json())
hey! So I'm coming back to work on this - going to try and implement the option to parse from location lists AND based on the type specification in the x86 ABI (of course only if the arch matches).
Question - what tests would you want to see if I PR these changes? Right now I'm generating corpus json output and just checking it closely (and those could become the test set and a new example added if/when a case is needed).
we generally like tests that are as end-to-end as possible around here, so if you have some notion of a set of "in the wild" inputs and a way to validate the outputs of processing those inputs, that would be totally acceptable.
Hey wanted to check in again! So for all the subfolders here that don't start with underscore and have a facts.json, this is the format we are looking toward for an output. I originally didn't have the types lookup but it would take up too much memory parsing the same types and adding them so I added it to account for that.
https://github.com/vsoch/cle/tree/add/x86-parser-june/examples
Does any of that look like something you'd eventually want for cle? I ask because if not, I can move it into its own library instead. But I'll keep working within cle if it's still of interest!
This issue has been marked as stale
because it has no recent activity. Please comment or add the pinned
tag to prevent this issue from being closed.
This is now (mostly done) at https://github.com/vsoch/cle (note I'm using a "main" for the main branch instead of master). It might not be what is desired here, but I've tried to stay up to date with angr master so it's something we could consider, if there is interest. If not, feel free to close the issue!
There is still interest, but again, the only person who actually knows his way around the exception parsing code is @ltfish. I would be willing to merge something based solely on testcases looking sound, but I can't seem to validate that on sight based on the branch you've linked. If you could submit a minimal PR (i.e. 1-3 testcases that are evaluated solely on the ability of CLE to parse exceptions, I would gladly review it.
The current design is that the test cases are in a different respository - do you want them added to angr proper to allow that?
Generally our design is that we put all our testcase binaries and data files in https://github.com/angr/binaries/ and put the py tests themselves in the tests directory of the appropriate repository.
Gotcha - so just to clarify - you want the actual binaries (post-compile) added there directly (and not some build process that compiles them?). Can you show me your preference for where in that structure? And once I have them there, can you show me an example in angr where the repository is obtained and the tests run? I can try to mimic that structure!
Yes - since we're hardcoding properties about the binaries themselves in our testcases (occasionally to the level of individual instruction addresses), we can't rely on compilers to not change this stuff out from under us!
The expectation is that the binaries repository is cloned into the same directory that cle or angr or whatever is. We have the angr-dev repository to set up this structure. The testcases manually reach outside their repository to get to these files. Here is a good example of a testcase using that model.
okay cool! I'm putting together the pieces. So I have:
- a branch of a fork of binaries with my test cases
- my branch of cle with a custom tests.py file that targets the files I have in binaries (assuming the same path/structure)
Remaining questions - how do I PR a branch I have for angr/cle to test in a way to use my fork of binaries?
Looks like requirements come from here, so if I need deepdiff to compare json structures will need to be added here (although I can verify this when it first tries and fails) https://github.com/angr/ci-settings/blob/master/ci-image/conf/requirements.txt.template
how do I PR a branch I have for angr/cle to test in a way to use my fork of binaries?
Open a pull request for each repository and mention each pull request in the other's description. The CI will see this reference and pull from that branch while building.
if I need deepdiff to compare json structures
This is actually an interesting point - I believe the right thing to do for now is to just add deepdiff as a hard dependency of CLE. In the future, we should have it be an optional dependency or a development in CLE's setup.cfg and install all optional/dev dependencies during CI.