pyelftools icon indicating copy to clipboard operation
pyelftools copied to clipboard

Assertion error on address_size

Open bavovanachte opened this issue 3 years ago • 3 comments

Hi,

I've written an (minimal) python version of addr2line based on one of your examples (see below), due to some issues encountered with the one packaged in binutils. When I try to run it, I run into the following assertion:

  File "/workdir/tools/unittestframework/scripts/addr2line.py", line 29, in addr2line
    full_mapping = decode_file_line(dwarfinfo)
  File "/workdir/tools/unittestframework/scripts/addr2line.py", line 46, in decode_file_line
    for CU in dwarfinfo.iter_CUs():
  File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/dwarfinfo.py", line 355, in _parse_CUs_iter
    cu = self._cached_CU_at_offset(offset)
  File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/dwarfinfo.py", line 385, in _cached_CU_at_offset
    cu = self._parse_CU_at_offset(offset)
  File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/dwarfinfo.py", line 423, in _parse_CU_at_offset
    dwarf_version=cu_header['version'])
  File "/home/developer/.local/lib/python3.7/site-packages/elftools/dwarf/structs.py", line 92, in __init__
    assert address_size == 8 or address_size == 4, str(address_size)
AssertionError: 2

Sadly I'm not at liberty to share my .elf file here (which I realize makes this more difficult), but could you give me some pointers as to what the issue might be? We're using a port of gcc (based on 7.5.0) for a 16-bit processor, so it could very well be an issue on compiler-side, but I have no idea how to continue debugging this.

Could you give some pointers?

'''
Python version of addr2line using pyelftools

Based on elftools example: dwarf_decode_address.py
'''
from elftools.elf.elffile import ELFFile

def addr2line(filename, address_list):
    ''' Function for extracting a dict of asm-to-c mappings, given a list of addresses

    Arguments:
        filename (str): Filename of the elf file to extract data from
        address_list (list[int]): List of addresses to extract c lines for

    Returns:
        dict{int:str}: Dictionary with the addresses as keys and the corresponding C lines and line numbers as values
    '''
    print('Processing file:', filename)
    with open(filename, 'rb') as f:
        elffile = ELFFile(f)

        if not elffile.has_dwarf_info():
            print('  file has no DWARF info')
            return

        # get_dwarf_info returns a DWARFInfo context object, which is the
        # starting point for all DWARF-based processing in pyelftools.
        dwarfinfo = elffile.get_dwarf_info()
        full_mapping = decode_file_line(dwarfinfo)

        filtered_mapping = {k:v for (k,v) in full_mapping.items() if k in address_list}
    return filtered_mapping

def decode_file_line(dwarfinfo):
    ''' Function for extracting a full (unfiltered) list of addresses with corresponding C lines and line numbers

    Arguments:
        dwarfinfo (DWARFInfo): DWARFInfo object representing the debugging information from an ELF file.

    Returns:
        dict{int:str}: Dictionary with the addresses as keys and the corresponding C lines and line numbers as values
    '''
    c_mapping = dict()
    # Go over all the line programs in the DWARF information, looking for
    # one that describes the given address.
    for CU in dwarfinfo.iter_CUs():
        # First, look at line programs to find the file/line for the address
        lineprog = dwarfinfo.line_program_for_CU(CU)
        lp_header = lineprog.header
        dir_entries = lp_header["include_directory"]

        # # File and directory indices are 1-indexed.
        # file_entry = file_entries[file_index - 1]
        prevstate = None
        for entry in lineprog.get_entries():
            # We're interested in those entries where a new state is assigned
            if entry.state is None:
                continue
            # Looking for a range of addresses in two consecutive states that
            # contain the required address.
            if prevstate and prevstate.address != entry.state.address:
                filename = lineprog['file_entry'][prevstate.file - 1].name.decode("utf-8")
                filedir = dir_entries[lineprog['file_entry'][prevstate.file - 1].dir_index - 1].decode("utf-8")
                line = prevstate.line
                for addr in range(prevstate.address, entry.state.address, 1):
                    c_mapping[addr] = "{filedir}/{filename}:{line}".format(filedir=filedir, filename=filename, line=line)
            if entry.state.end_sequence:
                # For the state with `end_sequence`, `address` means the address
                # of the first byte after the target machine instruction
                # sequence and other information is meaningless. We clear
                # prevstate so that it's not used in the next iteration. Address
                # info is used in the above comparison to see if we need to use
                # the line information for the prevstate.
                prevstate = None
            else:
                prevstate = entry.state
    return c_mapping

bavovanachte avatar Feb 17 '22 13:02 bavovanachte

@sevaa

I believe this error is due to ELF's support for 32 or 64-bit addresses, at least as specified. Is your compiler emitting other address sizes for the 16-bit CPU? What does readelf -h say?

eliben avatar Apr 01 '24 12:04 eliben

Pyelftools very explicitly does not support 16 bit addresses (or addresses with a segment/selector part). There is an assertion right there;

assert address_size == 8 or address_size == 4, str(address_size)

Support can be added, but as always, we'd really like a test binary.

See if eu-addr2line at https://sourceware.org/elfutils/ does better than the binutils' one. The maintainers of binutils are not pulling their weight.

sevaa avatar Apr 01 '24 14:04 sevaa

@bavovanachte Is this still an issue?

sevaa avatar Apr 18 '24 15:04 sevaa