pyelftools icon indicating copy to clipboard operation
pyelftools copied to clipboard

Get memory address of C++ class members

Open dkuschmierz opened this issue 5 years ago • 7 comments

Hi, currently I'm playing around with the elftools and try to get some addresses out of my elf. Can you give an example how to parse out the addresses of class members?

I created a little demo application:

`class TestClass { public: int a = 5; int b = 11;

int add() { return a+b; }

};

TestClass tester;

int main() {
tester.add(); return 0; }`

When I compile the programm and read out the symbols with the ElfReader, I get the following results:

Symbol table '.symtab' contains 146 entries: Num: Value Size Type Bind Vis Ndx Name 0: 00000000 0 NOTYPE LOCAL DEFAULT UND 1: 00008000 0 SECTION LOCAL DEFAULT 1 2: 00008018 0 SECTION LOCAL DEFAULT 2 3: 00008670 0 SECTION LOCAL DEFAULT 3 4: 00008688 0 SECTION LOCAL DEFAULT 4 5: 0000868c 0 SECTION LOCAL DEFAULT 5 6: 00008694 0 SECTION LOCAL DEFAULT 6 7: 00018698 0 SECTION LOCAL DEFAULT 7 8: 000186a0 0 SECTION LOCAL DEFAULT 8 9: 000186a8 0 SECTION LOCAL DEFAULT 9 10: 00018ae4 0 SECTION LOCAL DEFAULT 10 11: 00000000 0 SECTION LOCAL DEFAULT 11 12: 00000000 0 SECTION LOCAL DEFAULT 12 13: 00000000 0 SECTION LOCAL DEFAULT 13 14: 00000000 0 FILE LOCAL DEFAULT ABS c:/.conan/7dcb71/1/bin/../lib/gcc/arm-none-eabi/8.3.1/crti.o 15: 00008000 0 NOTYPE LOCAL DEFAULT 1 $a 16: 00008670 0 NOTYPE LOCAL DEFAULT 3 $a 17: 00000000 0 FILE LOCAL DEFAULT ABS c:/.conan/7dcb71/1/bin/../lib/gcc/arm-none-eabi/8.3.1/crtn.o 18: 0000800c 0 NOTYPE LOCAL DEFAULT 1 $a 19: 0000867c 0 NOTYPE LOCAL DEFAULT 3 $a 20: 00000000 0 FILE LOCAL DEFAULT ABS exit.c 21: 00008018 0 NOTYPE LOCAL DEFAULT 2 $a 22: 00008048 0 NOTYPE LOCAL DEFAULT 2 $d 23: 00000000 0 FILE LOCAL DEFAULT ABS __call_atexit.c 24: 0000804c 0 NOTYPE LOCAL DEFAULT 2 $a 25: 0000804c 40 FUNC LOCAL DEFAULT 2 register_fini 26: 0000806c 0 NOTYPE LOCAL DEFAULT 2 $d 27: 00018698 0 NOTYPE LOCAL DEFAULT 7 $d 28: 000083f4 0 NOTYPE LOCAL DEFAULT 2 $a 29: 00008514 0 NOTYPE LOCAL DEFAULT 2 $d 30: 00018ae0 0 NOTYPE LOCAL DEFAULT 9 $d 31: 00000000 0 FILE LOCAL DEFAULT ABS crtstuff.c 32: 00008694 0 OBJECT LOCAL DEFAULT 6 33: 000186a8 0 NOTYPE LOCAL DEFAULT 9 $d 34: 00008074 0 NOTYPE LOCAL DEFAULT 2 $a 35: 00008074 0 FUNC LOCAL DEFAULT 2 __do_global_dtors_aux 36: 000080a8 0 NOTYPE LOCAL DEFAULT 2 $d 37: 00018ae4 1 NOTYPE LOCAL DEFAULT 10 completed.8885 38: 000186a0 0 NOTYPE LOCAL DEFAULT 8 $d 39: 000186a0 0 OBJECT LOCAL DEFAULT 8 __do_global_dtors_aux_fini_array_entry 40: 000080b4 0 NOTYPE LOCAL DEFAULT 2 $a 41: 000080b4 0 FUNC LOCAL DEFAULT 2 frame_dummy 42: 000080d8 0 NOTYPE LOCAL DEFAULT 2 $d 43: 00018ae8 24 NOTYPE LOCAL DEFAULT 10 object.8890 44: 0001869c 0 NOTYPE LOCAL DEFAULT 7 $d 45: 0001869c 0 OBJECT LOCAL DEFAULT 7 __frame_dummy_init_array_entry 46: 00018ae4 0 NOTYPE LOCAL DEFAULT 10 $d 47: 00000000 0 FILE LOCAL DEFAULT ABS c:/.conan/7dcb71/1/bin/../lib/gcc/arm-none-eabi/8.3.1/../../../../arm-none-eabi/lib/crt0.o 48: 000080e4 0 NOTYPE LOCAL DEFAULT 2 $a 49: 000081d8 0 NOTYPE LOCAL DEFAULT 2 $d 50: 0000868c 0 NOTYPE LOCAL DEFAULT 5 $d 51: 00000000 0 FILE LOCAL DEFAULT ABS TestClass.cpp 52: 00008220 0 NOTYPE LOCAL DEFAULT 2 $a 53: 00008694 0 NOTYPE LOCAL DEFAULT 5 $d 54: 000186ac 0 NOTYPE LOCAL DEFAULT 9 $d 55: 000081f8 0 NOTYPE LOCAL DEFAULT 2 $a 56: 0000821c 0 NOTYPE LOCAL DEFAULT 2 $d 57: 00008694 0 NOTYPE LOCAL DEFAULT 5 $d 58: 00000000 0 FILE LOCAL DEFAULT ABS impure.c 59: 000186b4 0 NOTYPE LOCAL DEFAULT 9 $d 60: 000186b8 1064 OBJECT LOCAL DEFAULT 9 impure_data 61: 000186b8 0 NOTYPE LOCAL DEFAULT 9 $d 62: 00008688 0 NOTYPE LOCAL DEFAULT 4 $d 63: 00000000 0 FILE LOCAL DEFAULT ABS init.c 64: 00008254 0 NOTYPE LOCAL DEFAULT 2 $a 65: 000082cc 0 NOTYPE LOCAL DEFAULT 2 $d 66: 00000000 0 FILE LOCAL DEFAULT ABS memset.c 67: 000082dc 0 NOTYPE LOCAL DEFAULT 2 $a 68: 00000000 0 FILE LOCAL DEFAULT ABS atexit.c 69: 0000851c 0 NOTYPE LOCAL DEFAULT 2 $a 70: 00000000 0 FILE LOCAL DEFAULT ABS fini.c 71: 0000853c 0 NOTYPE LOCAL DEFAULT 2 $a 72: 0000857c 0 NOTYPE LOCAL DEFAULT 2 $d 73: 00000000 0 FILE LOCAL DEFAULT ABS lock.c 74: 00008584 0 NOTYPE LOCAL DEFAULT 2 $a 75: 00008588 0 NOTYPE LOCAL DEFAULT 2 $a 76: 0000858c 0 NOTYPE LOCAL DEFAULT 2 $a 77: 00008590 0 NOTYPE LOCAL DEFAULT 2 $a 78: 00008594 0 NOTYPE LOCAL DEFAULT 2 $a 79: 00008598 0 NOTYPE LOCAL DEFAULT 2 $a 80: 0000859c 0 NOTYPE LOCAL DEFAULT 2 $a 81: 000085a4 0 NOTYPE LOCAL DEFAULT 2 $a 82: 000085ac 0 NOTYPE LOCAL DEFAULT 2 $a 83: 000085b0 0 NOTYPE LOCAL DEFAULT 2 $a 84: 00000000 0 FILE LOCAL DEFAULT ABS __atexit.c 85: 000085b4 0 NOTYPE LOCAL DEFAULT 2 $a 86: 00008664 0 NOTYPE LOCAL DEFAULT 2 $d 87: 00000000 0 FILE LOCAL DEFAULT ABS _exit.c 88: 0000866c 0 NOTYPE LOCAL DEFAULT 2 $a 89: 00000000 0 FILE LOCAL DEFAULT ABS crtstuff.c 90: 00008694 0 NOTYPE LOCAL DEFAULT 6 $d 91: 00008694 0 OBJECT LOCAL DEFAULT 6 __FRAME_END__ 92: 00000000 0 FILE LOCAL DEFAULT ABS 93: 000186a4 0 NOTYPE LOCAL DEFAULT 8 __fini_array_end 94: 000186a0 0 NOTYPE LOCAL DEFAULT 8 __fini_array_start 95: 000186a0 0 NOTYPE LOCAL DEFAULT 7 __init_array_end 96: 00018698 0 NOTYPE LOCAL DEFAULT 7 __preinit_array_end 97: 00018698 0 NOTYPE LOCAL DEFAULT 7 __init_array_start 98: 00018698 0 NOTYPE LOCAL DEFAULT 7 __preinit_array_start 99: 00008220 52 FUNC WEAK DEFAULT 2 _ZN9TestClass3addEv 100: 00018b00 1 OBJECT GLOBAL DEFAULT 10 __lock___atexit_recursive_mutex 101: 00018b04 1 OBJECT GLOBAL DEFAULT 10 __lock___arc4random_mutex 102: 00018ae0 4 OBJECT GLOBAL DEFAULT 9 __atexit_recursive_mutex 103: 0000858c 4 FUNC GLOBAL DEFAULT 2 __retarget_lock_close 104: 00018b24 0 NOTYPE GLOBAL DEFAULT 10 _bss_end__ 105: 00018ae4 0 NOTYPE GLOBAL DEFAULT 10 __bss_start__ 106: 000186a8 0 OBJECT GLOBAL HIDDEN 9 __dso_handle **107: 000186ac 8 OBJECT GLOBAL DEFAULT 9 tester** 108: 00018b08 1 OBJECT GLOBAL DEFAULT 10 __lock___env_recursive_mutex 109: 00018b0c 1 OBJECT GLOBAL DEFAULT 10 __lock___sinit_recursive_mutex 110: 00008688 4 OBJECT GLOBAL DEFAULT 4 _global_impure_ptr 111: 00008254 136 FUNC GLOBAL DEFAULT 2 __libc_init_array 112: 000080e4 0 NOTYPE GLOBAL DEFAULT 2 _mainCRTStartup 113: 00008000 0 FUNC GLOBAL DEFAULT 1 _init 114: 0000853c 72 FUNC GLOBAL DEFAULT 2 __libc_fini_array 115: 00018b10 1 OBJECT GLOBAL DEFAULT 10 __lock___malloc_recursive_mutex 116: 000085b0 4 FUNC GLOBAL DEFAULT 2 __retarget_lock_release_recursive 117: 000085a4 8 FUNC GLOBAL DEFAULT 2 __retarget_lock_try_acquire_recursive 118: 00018b24 0 NOTYPE GLOBAL DEFAULT 10 __bss_end__ 119: 000083f4 296 FUNC GLOBAL DEFAULT 2 __call_exitprocs 120: 000080e4 0 NOTYPE GLOBAL DEFAULT 2 _start 121: 0000859c 8 FUNC GLOBAL DEFAULT 2 __retarget_lock_try_acquire 122: 000085b4 184 FUNC GLOBAL DEFAULT 2 __register_exitproc 123: 00008590 4 FUNC GLOBAL DEFAULT 2 __retarget_lock_close_recursive 124: 00008598 4 FUNC GLOBAL DEFAULT 2 __retarget_lock_acquire_recursive 125: 00018ae4 0 NOTYPE GLOBAL DEFAULT 10 __bss_start 126: 000082dc 280 FUNC GLOBAL DEFAULT 2 memset 127: 000081f8 40 FUNC GLOBAL DEFAULT 2 main 128: 00008588 4 FUNC GLOBAL DEFAULT 2 __retarget_lock_init_recursive 129: 00018b24 0 NOTYPE GLOBAL DEFAULT 10 __end__ 130: 00008584 4 FUNC GLOBAL DEFAULT 2 __retarget_lock_init 131: 00008670 0 FUNC GLOBAL DEFAULT 3 _fini 132: 0000851c 32 FUNC GLOBAL DEFAULT 2 atexit 133: 000186b4 4 OBJECT GLOBAL DEFAULT 9 _impure_ptr 134: 00018ae4 0 NOTYPE GLOBAL DEFAULT 9 _edata 135: 00018b24 0 NOTYPE GLOBAL DEFAULT 10 _end 136: 00018b14 1 OBJECT GLOBAL DEFAULT 10 __lock___at_quick_exit_mutex 137: 00008018 52 FUNC GLOBAL DEFAULT 2 exit 138: 00008594 4 FUNC GLOBAL DEFAULT 2 __retarget_lock_acquire 139: 000085ac 4 FUNC GLOBAL DEFAULT 2 __retarget_lock_release 140: 0000866c 4 FUNC GLOBAL DEFAULT 2 _exit 141: 00018b18 1 OBJECT GLOBAL DEFAULT 10 __lock___dd_hash_mutex 142: 00018b1c 1 OBJECT GLOBAL DEFAULT 10 __lock___tz_mutex 143: 00080000 0 NOTYPE GLOBAL DEFAULT 11 _stack 144: 000186a8 0 NOTYPE GLOBAL DEFAULT 9 __data_start 145: 00018b20 1 OBJECT GLOBAL DEFAULT 10 __lock___sfp_recursive_mutex I can see the global tester object with its address. But can I also get the addresses for the class members a and b?

dkuschmierz avatar Feb 06 '20 16:02 dkuschmierz

What you're looking for is in the debug information indeed. pyelftools can help you here. To get a general idea what the debug info is like, there's this GUI DWARF visualizer app that I'm building:

https://github.com/sevaa/dwex

EDIT: it's on PyPI now. Use pip install dwex to install, with sudo if necessary, then dwex to run.

Also, please format your text. 🤢

sevaa avatar Feb 06 '20 16:02 sevaa

Hi sevaa, Not exactly what I was looking for, but a great work. Unfortunately it crashes after if few minutes (Elf-Filesize ~100Mb). What I‘m looking for is something like iterate through all symbols in the symboltable and if the type is a struct or a class, iterate through it‘s members recursively. Is that possible? Where do I get the struct/class information from?

dkuschmierz avatar Feb 08 '20 07:02 dkuschmierz

Well, it's a beta :) Please create an issue at https://github.com/sevaa/dwex and attach your file, I'll take a look.

Iterating through all DIEs in a binary is straightforward enough with pyelftools. It would go like this:

# filename is the path to the ELF file
with open(filepath, 'rb') as f:
        elffile = ELFFile(f)
        dwarfinfo = elffile.get_dwarf_info()
        for CU in dwarfinfo.iter_CUs(): # Iterate thtough source files
            for DIE in CU.iter_DIEs():
                if DIE.tag == 'DW_TAG_structure_type': # Found a class
                    for child in DIE.iter_children():
                        if child.tag == 'DW_TAG_member': # Found a data member
                            # Do whatever

As for the DWARF Explorer, it can give you an idea how exactly are classes stored in the tree. For example, data member type is not stored right there; instead, there's an attribute DW_AT_type that points at the type record elsewhere, which in turn can have a reference (e. g. if it's an array or a pointer).

sevaa avatar Feb 08 '20 14:02 sevaa

OK, i got that working, but where do I get the types from?

E.g. I get the following entry:

('DW_AT_type', AttributeValue(name='DW_AT_type', form='DW_FORM_ref_addr', value=144407, raw_value=144407, offset=467905))

But if I search for the value 144407, I do not find anything.

dkuschmierz avatar Feb 11 '20 12:02 dkuschmierz

Form ref_addr means that the attribute value is an offset of a DIE elsewhere in the file, probably in another CU.

First, you have to find the target CU by matching the value (144407 in your case) against the cu_offset field of the CUs in the file. The CUs in the iter_CUs() collection go in the order of increasing offsets. If the target DIE offset falls between the starting offset of a CU and the starting offset of the next CU, that's the CU that you want.

There's no random CU access, so you can't search by bisection, you'd have to scroll through. Something like this:

        prev_cu = None
        for cu in dwarfinfo.iter_CUs()
            if prev_cu is None:
                prev_cu = cu
            elif cu.cu_offset > target_offset:
                return prev_cu
            else:
                prev_cu = cu
        # What if it's the last one?
        if cu.cu_offset < target_offset
            return cu
        return None # Or throw an exception here, means the target offset is tragically off.

Then you have to scroll through the DIEs in the CU that you just found and find the one with offset that exactly matches the value. The DIE offset is in the field called offset in the DIE object, as returned from CU.iter_DIEs(). That's your type record.

Keep in mind that this logic only applies if the form of the DW_AT_type is DW_FORM_ref_addr. There are other reference formats in DWARF.

There's a pending PR for DIE reference chasing: https://github.com/eliben/pyelftools/pull/264 , that will move this kind of operation to the library level. Feel free to upvote that PR :)

On a side note, I've updated DWARF Explorer over the weekend - can you install the latest (pip install --upgrade dwex) and see if it still crashes?

sevaa avatar Feb 11 '20 13:02 sevaa

There's no random CU access, so you can't search by bisection, you'd have to scroll through.

Then you have to scroll through the DIEs in the CU that you just found and find the one with offset that exactly matches the value. The DIE offset is in the field called offset in the DIE object, as returned from CU.iter_DIEs(). That's your type record.

Both of these will hopefully be added soon, they are in #264 along with get_DIE_from_attribute. Be aware that you may also need to follow other attributes to get the DIE with DW_AT_type as described in #306 .

mdmillerii avatar Apr 17 '20 11:04 mdmillerii

lldb can be called from python and can evaluate expressions to determine the address of class members.

This class def in a .cpp file, compiled for an ARM M4, added in a main.cpp...

namespace TestNameSpace
{
	class TestClass {
		public:
		int a = 5;
		int b = 11;

		int add() {
			return a+b;
		}
	};

	TestClass tester;
};

...can be evaluated in python...

import lldb

dbg = lldb.SBDebugger().Create()
elf_file = "some.elf"
target = dbg.CreateTarget(elf_file)
val = target.EvaluateExpression("&TestNameSpace::tester.b")
print("'{0}', type={1}, name={2}, value={3}, result={4}".format(val, val.type, val.name, val.value, val.GetError()))

...and the result is...

'(int *) $0 = 0x20000004', type=int *, name=$0, value=0x20000004, result=success

The tester instance is located near 2 g'zillion, 4 bytes from the beginning of RAM in this simple M4 example. &TestNameSpace::tester.a is at 2 g'zillion even. I added the namespace to your example just to make the expression more C++-ish. lldb knows the type and the address of this expression just by looking at the .elf file. This example is running in Ubuntu on an x86-64, and the .elf is cross-compiled for a 32-bit ARM.

lldb is part of llvm and isn't as simple as doing a 'pip install lldb' but it is intended to be called as a library from C++ or Python. I don't know why lldb isn't more widely used.

ref: https://stackoverflow.com/a/66791351/101252

jimfred avatar Mar 27 '21 00:03 jimfred