pyelftools icon indicating copy to clipboard operation
pyelftools copied to clipboard

DW_FORM_indirect not parsed properly

Open sevaa opened this issue 2 years ago • 9 comments

I don't have a binary for this, but I have a crash report that DW_FORM_indirect exists out in the wild.

The encoding for the form is: ULEB128, interpreted as a DWARF form code, followed by the value in that form. It's effectively dynamic typing in DIEs. The current implementation only reads the ULEB128, doesn't read the actual value. As a result, the current position is off, subsequent attributes and DIEs (if enumerating) are misread.

EDIT: now that I look at it, there is a branch in _translate_attr_value, written under the assumption that _translate_attr_value is called immediately after reading the raw value. That parser, however, loses the real form of the attribute - not good. The form carries a lot of information, especially in later DWARF versions.

sevaa avatar Jun 21 '22 15:06 sevaa

dwarfdump.py v0.29 raises on a DW_FORM_indirect in the .o file here (after defining "EM_ARM" in arches)... newlib.zip

0xC0DED00D avatar May 20 '23 21:05 0xC0DED00D

Can't reproduce on pyelftools 0.29, as packaged with DWARF Explorer. Parses and shows all DIEs. What is the parsing code on your end?

Also, this is definitely not the same issue as originally reported.

sevaa avatar May 21 '23 15:05 sevaa

I did this:

  1. Download and extract zip of v0.29 (https://github.com/eliben/pyelftools/archive/refs/tags/v0.29.zip)
  2. Change attr at die.py line 125 to repr(attr), and change the dict lookup at dwarfdump.py line 346 to arches.get(self.elffile['e_machine'], self.elffile['e_machine']) to suppress some (I think) unrelated errors:
diff -r pyelftools-0.29-1/elftools/dwarf/die.py pyelftools-0.29-2/elftools/dwarf/die.py
125c125
<             raise DWARFError('%s is not a reference class form attribute' % attr)
---
>             raise DWARFError('%s is not a reference class form attribute' % repr(attr))
diff -r pyelftools-0.29-1/scripts/dwarfdump.py pyelftools-0.29-2/scripts/dwarfdump.py
346c346
<         arch = arches[self.elffile['e_machine']]
---
>         arch = arches.get(self.elffile['e_machine'], self.elffile['e_machine'])
  1. Run dwarfdump.py --debug-info, supplying pyelftools-0.29 in PYTHONPATH:
$ (PYTHONPATH=~/Downloads/pyelftools-0.29 python3 ~/Downloads/pyelftools-0.29/scripts/dwarfdump.py --debug-info gmtime_r.o)



Mine raises while printing the seventh DIE, saying

AttributeValue(name='DW_AT_type', form='DW_FORM_indirect', value=264, raw_value=18, offset=257) is not a reference class form attribute

$ (PYTHONPATH=~/Downloads/pyelftools-0.29 python3 ~/Downloads/pyelftools-0.29/scripts/dwarfdump.py --debug-info gmtime_r.o)
gmtime_r.o:  file format elf32-EM_ARM
.debug_info contents:
0x00000000: Compile Unit: length = 0x00000130, format = DWARF32, version = 0x0003, abbr_offset = 0x0000, addr_size = 0x04 (next unit at 0x00000134)

0x0000000b: DW_TAG_compile_unit [10] *
              DW_AT_name [DW_FORM_string] ("./thirdparty/newlib/gmtime_r.c")
              DW_AT_producer [DW_FORM_string] ("Component: ARM Compiler 5.06 update 6 (build 750) Tool: armcc [4d3637]")
              DW_AT_language [DW_FORM_data2]  (DW_LANG_C99)
              DW_AT_comp_dir [DW_FORM_string] ("\\Mac\Home\dev\github.com\0xC0DED00D\firmware")
              DW_AT_macro_info [DW_FORM_data4]  (0x00000000)
              DW_AT_stmt_list [DW_FORM_data4] (0x00000000)

0x000000aa: DW_TAG_base_type [4]  (0x0000000b)
              DW_AT_byte_size [DW_FORM_data1] (0x08)
              DW_AT_encoding [DW_FORM_data1]  (DW_ATE_complex_float)
              DW_AT_name [DW_FORM_string] ("_Complex long_double")

0x000000c2: DW_TAG_base_type [4]  (0x0000000b)
              DW_AT_byte_size [DW_FORM_data1] (0x08)
              DW_AT_encoding [DW_FORM_data1]  (DW_ATE_complex_float)
              DW_AT_name [DW_FORM_string] ("_Complex double")

0x000000d5: DW_TAG_base_type [4]  (0x0000000b)
              DW_AT_byte_size [DW_FORM_data1] (0x04)
              DW_AT_encoding [DW_FORM_data1]  (DW_ATE_complex_float)
              DW_AT_name [DW_FORM_string] ("_Complex float")

0x000000e7: DW_TAG_unspecified_type [117]  (0x0000000b)
              DW_AT_name [DW_FORM_string] ("void")

0x000000ed: DW_TAG_structure_type [41] * (0x0000000b)
              DW_AT_sibling [DW_FORM_ref_udata] (264)
              DW_AT_name [DW_FORM_string] ("__va_list")
              DW_AT_byte_size [DW_FORM_udata] (4)

0x000000fb: DW_TAG_member [30]  (0x000000ed)
              DW_AT_name [DW_FORM_string] ("__ap")
Traceback (most recent call last):
  File "/Users/jwatts/Downloads/pyelftools-0.29/scripts/dwarfdump.py", line 560, in <module>
    main()
  File "/Users/jwatts/Downloads/pyelftools-0.29/scripts/dwarfdump.py", line 542, in main
    readelf.dump_info()
  File "/Users/jwatts/Downloads/pyelftools-0.29/scripts/dwarfdump.py", line 393, in dump_info
    self._emitline("              %s [%s] (%s)" % (attr_name, attr.form, self.describe_attr_value(die, attr)))
                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwatts/Downloads/pyelftools-0.29/scripts/dwarfdump.py", line 404, in describe_attr_value
    return ATTR_DESCRIPTIONS[attr.name](attr, die)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwatts/Downloads/pyelftools-0.29/scripts/dwarfdump.py", line 290, in _desc_datatype
    return _desc_ref(attr, die, describe_cpp_datatype(die))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwatts/Downloads/pyelftools-0.29/elftools/dwarf/datatype_cpp.py", line 18, in describe_cpp_datatype
    return str(parse_cpp_datatype(var_die))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwatts/Downloads/pyelftools-0.29/elftools/dwarf/datatype_cpp.py", line 35, in parse_cpp_datatype
    type_die = var_die.get_DIE_from_attribute('DW_AT_type')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwatts/Downloads/pyelftools-0.29/elftools/dwarf/die.py", line 125, in get_DIE_from_attribute
    raise DWARFError('%s is not a reference class form attribute' % repr(attr))
elftools.common.exceptions.DWARFError: AttributeValue(name='DW_AT_type', form='DW_FORM_indirect', value=264, raw_value=18, offset=257) is not a reference class form attribute
$ 

0xC0DED00D avatar May 21 '23 22:05 0xC0DED00D

Now that makes sense. The problem is tied to the original issue in the sense that DW_FORM_indirect DIE attributes have, essentially, two forms - indirect that it's stored with, and the real one. The library currently has no facility to report both and no cheap way to introduce it - changing the datatype of form will break a great amount of things downstream, and extending the AttributeValue namedtuple will break a lot in the unit tests.

Also, technically, there can be an arbitrarily long chain of "indirect"s.

Which form code should the library return as the form property? That's a question of design philosophy - are we parsing for fidelity, as to retain all facets of the source data structure, or are we parsing for utility for practical debugging? The former would call for returning indirect, the latter - the real form. The current implementation returns the former, but that breaks the more high level portions of the API that rely on the form to guess the semantics. Which brings us back to the issue at hand - indirect is not a form that is recognized as a form that can hold a reference.

I personally think we should change the API to report the real form, but also report that it came via indirect, extending the attribute namedtuple (and fixing the unit tests as needed). This might break some consumer code, code that somehow came to rely on indirect being reported. We can make it an opt-in behavior. @eliben - thoughts?

sevaa avatar May 22 '23 16:05 sevaa

@sevaa the option you propose in the last paragraph sounds reasonable. I'd like to see a concrete PR to judge it in more detail

eliben avatar May 25 '23 21:05 eliben

I'll see what can do.

sevaa avatar May 25 '23 21:05 sevaa

https://github.com/eliben/pyelftools/pull/475 should address that. @0xC0DED00D can you please make and share a test binary with DW_FORM_indirect as a test case for it?

sevaa avatar Jun 28 '23 14:06 sevaa

@sevaa it looks like you grabbed the .o I posted above in newlib.zip. Let me know if you need something else (although I don't know how much I can help. I have some .o files left over from a build system I was using a few years ago that no longer exists.)

0xC0DED00D avatar Jun 29 '23 06:06 0xC0DED00D

That file is fine, I think. Short and sweet.

sevaa avatar Jun 29 '23 14:06 sevaa