pyelftools
pyelftools copied to clipboard
Is DW_AT_ranges giving correct result?
Not sure if this is pyelftools question or general DWARF question.
I ran a slightly modified version of dwarf_range_lists.py from examples and got the following:
Found a compile unit at offset 99406553, length 9832
---------------------------- START TOP DIE -------------------------
DIE DW_TAG_compile_unit, size=35, has_children=True
|DW_AT_comp_dir : AttributeValue(name='DW_AT_comp_dir', form='DW_FORM_strp', value=b'directory/path', raw_value=7867314, offset=99406565)
|DW_AT_name : AttributeValue(name='DW_AT_name', form='DW_FORM_strp', value=b'file.C', raw_value=8080662, offset=99406569)
|DW_AT_producer : AttributeValue(name='DW_AT_producer', form='DW_FORM_strp', value=b'Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 16.0.0.109 Build 20150815\n Opt_report_file Linux_AMD64/file.opt', raw_value=8080780, offset=99406573)
|DW_AT_language : AttributeValue(name='DW_AT_language', form='DW_FORM_data1', value=4, raw_value=4, offset=99406581)
|DW_AT_use_UTF8 : AttributeValue(name='DW_AT_use_UTF8', form='DW_FORM_flag', value=True, raw_value=1, offset=99406582)
|DW_AT_low_pc : AttributeValue(name='DW_AT_low_pc', form='DW_FORM_addr', value=0, raw_value=0, offset=99406583)
|DW_AT_ranges : AttributeValue(name='DW_AT_ranges', form='DW_FORM_data4', value=2131408, raw_value=2131408, offset=99406591)
|DW_AT_stmt_list : AttributeValue(name='DW_AT_stmt_list', form='DW_FORM_data4', value=17619546, raw_value=17619546, offset=99406595)
----------------------------END OF TOP DIE-------------------------
DIE DW_TAG_compile_unit at 99406564. attr DW_AT_ranges.
[RangeEntry(begin_offset=21905584, end_offset=21905792), RangeEntry(begin_offset=21905792, end_offset=21906112), RangeEntry(begin_offset=22503872, end_offset=22505168)]
DIE DW_TAG_inlined_subroutine at 99406846. attr DW_AT_ranges.
[RangeEntry(begin_offset=22504548, end_offset=22504573), RangeEntry(begin_offset=22504581, end_offset=22504609), RangeEntry(begin_offset=22504629, end_offset=22504635), RangeEntry(begin_offset=22504643, end_offset=22504683), RangeEntry(begin_offset=22504732, end_offset=22504820), RangeEntry(begin_offset=22505009, end_offset=22505075)]
DIE DW_TAG_inlined_subroutine at 99406868. attr DW_AT_ranges.
[RangeEntry(begin_offset=22504548, end_offset=22504573), RangeEntry(begin_offset=22504581, end_offset=22504609), RangeEntry(begin_offset=22504629, end_offset=22504635), RangeEntry(begin_offset=22504643, end_offset=22504649), RangeEntry(begin_offset=22504774, end_offset=22504820), RangeEntry(begin_offset=22505009, end_offset=22505075)]
DIE DW_TAG_inlined_subroutine at 99407098. attr DW_AT_ranges.
[RangeEntry(begin_offset=22504376, end_offset=22504438), RangeEntry(begin_offset=22504450, end_offset=22504456), RangeEntry(begin_offset=22504490, end_offset=22504495), RangeEntry(begin_offset=22504836, end_offset=22505009)]
DIE DW_TAG_inlined_subroutine at 99407144. attr DW_AT_ranges.
[RangeEntry(begin_offset=22504408, end_offset=22504421), RangeEntry(begin_offset=22504436, end_offset=22504438), RangeEntry(begin_offset=22504450, end_offset=22504456)]
DIE DW_TAG_inlined_subroutine at 99407342. attr DW_AT_ranges.
[RangeEntry(begin_offset=22504146, end_offset=22504151), RangeEntry(begin_offset=22504154, end_offset=22504165)]
DIE DW_TAG_inlined_subroutine at 99407366. attr DW_AT_ranges.
[RangeEntry(begin_offset=22504146, end_offset=22504151), RangeEntry(begin_offset=22504154, end_offset=22504160)]
DIE DW_TAG_inlined_subroutine at 99407413. attr DW_AT_ranges.
[RangeEntry(begin_offset=22504146, end_offset=22504151), RangeEntry(begin_offset=22504154, end_offset=22504156)]
DIE DW_TAG_inlined_subroutine at 99414511. attr DW_AT_ranges.
[BaseAddressEntry(base_address=21905792), RangeEntry(begin_offset=21905803, end_offset=21905901), RangeEntry(begin_offset=21905943, end_offset=21905986), RangeEntry(begin_offset=21906028, end_offset=21906112)]
I believe to get code address I would get cu_offset + begin_offset for any RangeEntry with no BaseAddressEntry.
However, if there is a BaseAddressEntry, am I supposed to add BaseAddressEntry address to every RangeEntry begin_offset in the list? I would assume so due to the following line from DWARF docs v3, specifically point 2:
A base address selection entry consists of:
1. The value of the largest representable address offset (for example, 0xffffffff when the size of
an address is 32 bits).
2. An address, which defines the appropriate base address for use in interpreting the beginning
and ending address offsets of subsequent entries of the location list.
However looking at the values, it appears that the base address entry and subsequent range lists entries are not set up this way. They appear to be absolute addresses from the CU. Is this intended or am I misunderstanding something here? If so, what is the point of the BaseAddressEntry?
Depends on what do you mean by cu_offset
. If you mean the field cu_offset
in the CU
of pyelftools object, then NO. begin_offset in the ranges, unless there's a BaseAddressEntry, is relative to the start offset of the code in the current CU, as recorded in the top DIE under DW_AT_low_pc
(sometimes DW_AT_entry_pc
).
The field CU.cu_offset
holds the offset of the CU itself within the debug_info
section.
So for this case for the first list, it would be low_pc of TOP_DIE (0) + begin_offset (21905584).
What happens if there is a BaseAddressEntry? Would I do base_address (21905792) + begin_address (21905803). This doesn't seem right as 21905803 - 21905792 = 11 which doesn't seem right. This seems like this would be the low_pc of whichever CU this is in.
This, I think, is the relevant cite from the spec:
The applicable base address of a range list entry is determined by the closest preceding base address selection entry (see below) in the same range list. If there is no such selection entry, then the applicable base address defaults to the base address of the compilation unit (see Section 3.1).
This means, to me, that if there's a BaseAddressEntry
, its provided base address replaces the low_pc of the CU. not adds to it.
I'll try finding a BaseAddressEntry
in my binary archive. I don't think I've ever seen it up close.
Looking through my file, it seems like the BaseAddressEntry always has the same value within my CU which supports what you mentioned.
In that case it looks like for me to get the offset for the debug_line section I would do:
if no base address entry in list
then low_pc of cu + begin/end offset
else if base address entry in list
then just use begin/end offset
My reading of the spec is the order is significant. I would write this as:
- get location list from die
- get base address attribute from die (may be missing, accept None)
- parse an object if any remain
- if base address entry replace base with new value
- if entry add current base
- if base is None error
- goto 3
I'm not sure I'm following your steps @mdmillerii. Correct me if I'm wrong:
get DW_AT_range list
base_address = CU.low_pc
for entry in range list:
if BaseAddress:
base_address = entry.base_address
else if Range Entry:
actual begin offset for range entry = base_address + entry.begin_offset
If this is the case, then I'm confused as to the values in my example:
[BaseAddressEntry(base_address=21905792), RangeEntry(begin_offset=21905803, end_offset=21905901), RangeEntry(begin_offset=21905943, end_offset=21905986), RangeEntry(begin_offset=21906028, end_offset=21906112)]
The range entry objects don't seem to be offsets from the base address entry here.
The algorithm you posted is an accurate transcription of my description. I don't have time at present to re-read the spec or compare other sources to see if you have a broken provider.
My original reading of your description in your post was if a Base Address entry appeared in the list then use it even for the RangeEntry elements that occurred before it.