pyelftools Assertion error on address

Hi,

I've written an (minimal) python version of addr2line based on one of your examples (see below), due to some issues encountered with the one packaged in binutils. When I try to run it, I run into the following assertion:

  File "/workdir/tools/unittestframework/scripts/addr2line.py", line 29, in addr2line
    full_mapping = decode_file_line(dwarfinfo)
  File "/workdir/tools/unittestframework/scripts/addr2line.py", line 46, in decode_file_line
    for CU in dwarfinfo.iter_CUs():
  File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/dwarfinfo.py", line 355, in _parse_CUs_iter
    cu = self._cached_CU_at_offset(offset)
  File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/dwarfinfo.py", line 385, in _cached_CU_at_offset
    cu = self._parse_CU_at_offset(offset)
  File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/dwarfinfo.py", line 423, in _parse_CU_at_offset
    dwarf_version=cu_header['version'])
  File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/structs.py", line 92, in __init__
    assert address_size == 8 or address_size == 4, str(address_size)
AssertionError: 2

Sadly I'm not at liberty to share my .elf file here (which I realize makes this more difficult), but could you give me some pointers as to what the issue might be? We're using a port of gcc (based on 7.5.0) for a 16-bit processor, so it could very well be an issue on compiler-side, but I have no idea how to continue debugging this.

Could you give some pointers?

'''
Python version of addr2line using pyelftools

Based on elftools example: dwarf_decode_address.py
'''
from elftools.elf.elffile import ELFFile

def addr2line(filename, address_list):
    ''' Function for extracting a dict of asm-to-c mappings, given a list of addresses

    Arguments:
        filename (str): Filename of the elf file to extract data from
        address_list (list[int]): List of addresses to extract c lines for

    Returns:
        dict{int:str}: Dictionary with the addresses as keys and the corresponding C lines and line numbers as values
    '''
    print('Processing file:', filename)
    with open(filename, 'rb') as f:
        elffile = ELFFile(f)

        if not elffile.has_dwarf_info():
            print('  file has no DWARF info')
            return

        # get_dwarf_info returns a DWARFInfo context object, which is the
        # starting point for all DWARF-based processing in pyelftools.
        dwarfinfo = elffile.get_dwarf_info()
        full_mapping = decode_file_line(dwarfinfo)

        filtered_mapping = {k:v for (k,v) in full_mapping.items() if k in address_list}
    return filtered_mapping

def decode_file_line(dwarfinfo):
    ''' Function for extracting a full (unfiltered) list of addresses with corresponding C lines and line numbers

    Arguments:
        dwarfinfo (DWARFInfo): DWARFInfo object representing the debugging information from an ELF file.

    Returns:
        dict{int:str}: Dictionary with the addresses as keys and the corresponding C lines and line numbers as values
    '''
    c_mapping = dict()
    # Go over all the line programs in the DWARF information, looking for
    # one that describes the given address.
    for CU in dwarfinfo.iter_CUs():
        # First, look at line programs to find the file/line for the address
        lineprog = dwarfinfo.line_program_for_CU(CU)
        lp_header = lineprog.header
        dir_entries = lp_header["include_directory"]

        # # File and directory indices are 1-indexed.
        # file_entry = file_entries[file_index - 1]
        prevstate = None
        for entry in lineprog.get_entries():
            # We're interested in those entries where a new state is assigned
            if entry.state is None:
                continue
            # Looking for a range of addresses in two consecutive states that
            # contain the required address.
            if prevstate and prevstate.address != entry.state.address:
                filename = lineprog['file_entry'][prevstate.file - 1].name.decode("utf-8")
                filedir = dir_entries[lineprog['file_entry'][prevstate.file - 1].dir_index - 1].decode("utf-8")
                line = prevstate.line
                for addr in range(prevstate.address, entry.state.address, 1):
                    c_mapping[addr] = "{filedir}/{filename}:{line}".format(filedir=filedir, filename=filename, line=line)
            if entry.state.end_sequence:
                # For the state with `end_sequence`, `address` means the address
                # of the first byte after the target machine instruction
                # sequence and other information is meaningless. We clear
                # prevstate so that it's not used in the next iteration. Address
                # info is used in the above comparison to see if we need to use
                # the line information for the prevstate.
                prevstate = None
            else:
                prevstate = entry.state
    return c_mapping

Feb 17 '22 13:02 bavovanachte

@sevaa

I believe this error is due to ELF's support for 32 or 64-bit addresses, at least as specified. Is your compiler emitting other address sizes for the 16-bit CPU? What does readelf -h say?

Apr 01 '24 12:04 eliben

Pyelftools very explicitly does not support 16 bit addresses (or addresses with a segment/selector part). There is an assertion right there;

assert address_size == 8 or address_size == 4, str(address_size)

Support can be added, but as always, we'd really like a test binary.

See if eu-addr2line at https://sourceware.org/elfutils/ does better than the binutils' one. The maintainers of binutils are not pulling their weight.

Apr 01 '24 14:04 sevaa

@bavovanachte Is this still an issue?

Apr 18 '24 15:04 sevaa

pyelftools
pyelftools copied to clipboard

Assertion error on address_size

pyelftools pyelftools copied to clipboard

Assertion error on address_size

pyelftools
pyelftools copied to clipboard