pyelftools
pyelftools copied to clipboard
Assertion error on address_size
Hi,
I've written an (minimal) python version of addr2line based on one of your examples (see below), due to some issues encountered with the one packaged in binutils. When I try to run it, I run into the following assertion:
File "/workdir/tools/unittestframework/scripts/addr2line.py", line 29, in addr2line
full_mapping = decode_file_line(dwarfinfo)
File "/workdir/tools/unittestframework/scripts/addr2line.py", line 46, in decode_file_line
for CU in dwarfinfo.iter_CUs():
File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/dwarfinfo.py", line 355, in _parse_CUs_iter
cu = self._cached_CU_at_offset(offset)
File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/dwarfinfo.py", line 385, in _cached_CU_at_offset
cu = self._parse_CU_at_offset(offset)
File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/dwarfinfo.py", line 423, in _parse_CU_at_offset
dwarf_version=cu_header['version'])
File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/structs.py", line 92, in __init__
assert address_size == 8 or address_size == 4, str(address_size)
AssertionError: 2
Sadly I'm not at liberty to share my .elf file here (which I realize makes this more difficult), but could you give me some pointers as to what the issue might be? We're using a port of gcc (based on 7.5.0) for a 16-bit processor, so it could very well be an issue on compiler-side, but I have no idea how to continue debugging this.
Could you give some pointers?
'''
Python version of addr2line using pyelftools
Based on elftools example: dwarf_decode_address.py
'''
from elftools.elf.elffile import ELFFile
def addr2line(filename, address_list):
''' Function for extracting a dict of asm-to-c mappings, given a list of addresses
Arguments:
filename (str): Filename of the elf file to extract data from
address_list (list[int]): List of addresses to extract c lines for
Returns:
dict{int:str}: Dictionary with the addresses as keys and the corresponding C lines and line numbers as values
'''
print('Processing file:', filename)
with open(filename, 'rb') as f:
elffile = ELFFile(f)
if not elffile.has_dwarf_info():
print(' file has no DWARF info')
return
# get_dwarf_info returns a DWARFInfo context object, which is the
# starting point for all DWARF-based processing in pyelftools.
dwarfinfo = elffile.get_dwarf_info()
full_mapping = decode_file_line(dwarfinfo)
filtered_mapping = {k:v for (k,v) in full_mapping.items() if k in address_list}
return filtered_mapping
def decode_file_line(dwarfinfo):
''' Function for extracting a full (unfiltered) list of addresses with corresponding C lines and line numbers
Arguments:
dwarfinfo (DWARFInfo): DWARFInfo object representing the debugging information from an ELF file.
Returns:
dict{int:str}: Dictionary with the addresses as keys and the corresponding C lines and line numbers as values
'''
c_mapping = dict()
# Go over all the line programs in the DWARF information, looking for
# one that describes the given address.
for CU in dwarfinfo.iter_CUs():
# First, look at line programs to find the file/line for the address
lineprog = dwarfinfo.line_program_for_CU(CU)
lp_header = lineprog.header
dir_entries = lp_header["include_directory"]
# # File and directory indices are 1-indexed.
# file_entry = file_entries[file_index - 1]
prevstate = None
for entry in lineprog.get_entries():
# We're interested in those entries where a new state is assigned
if entry.state is None:
continue
# Looking for a range of addresses in two consecutive states that
# contain the required address.
if prevstate and prevstate.address != entry.state.address:
filename = lineprog['file_entry'][prevstate.file - 1].name.decode("utf-8")
filedir = dir_entries[lineprog['file_entry'][prevstate.file - 1].dir_index - 1].decode("utf-8")
line = prevstate.line
for addr in range(prevstate.address, entry.state.address, 1):
c_mapping[addr] = "{filedir}/{filename}:{line}".format(filedir=filedir, filename=filename, line=line)
if entry.state.end_sequence:
# For the state with `end_sequence`, `address` means the address
# of the first byte after the target machine instruction
# sequence and other information is meaningless. We clear
# prevstate so that it's not used in the next iteration. Address
# info is used in the above comparison to see if we need to use
# the line information for the prevstate.
prevstate = None
else:
prevstate = entry.state
return c_mapping
@sevaa
I believe this error is due to ELF's support for 32 or 64-bit addresses, at least as specified. Is your compiler emitting other address sizes for the 16-bit CPU? What does readelf -h say?
Pyelftools very explicitly does not support 16 bit addresses (or addresses with a segment/selector part). There is an assertion right there;
assert address_size == 8 or address_size == 4, str(address_size)
Support can be added, but as always, we'd really like a test binary.
See if eu-addr2line at https://sourceware.org/elfutils/ does better than the binutils' one. The maintainers of binutils are not pulling their weight.
@bavovanachte Is this still an issue?