pyelftools
pyelftools copied to clipboard
How to get variable type
From the demo, I can can global variable addr and size, but not found data type, like unit8 or int32 etc. So how to get type attributes with this lib? Many thanks!
Look at the DW_AT_type
attribute. It's a reference to another DIE that describes the datatype. References are usually stored as either a CU-relative offset (DW_FORM_refX
) or an absolute offset (DW_FORM_ref_addr
).
The type DIE can contain further references, for example if the type is an array or a pointer.
You can use DWARF Explorer to explore that data structure interactively.
Thanks for your kind reply.@eliben Follow what you said, I can get type of one variable. I have another 2 questions and need your help:
- How can I get length of one specific variable? eg. 4bytes 2bytes 1bytes
- How can I know the struct name of one variable, as you know in one struct there maybe some variables, but currently I can get struct name and variable name seperately, but don't know the relationship between struct and variable which I get.
- How can I get length of one specific variable? eg. 4bytes 2bytes 1bytes
Putting this upfront: not all variables have statically inferable sizes. VLAs, for example, won't have their length encoded in the DWARF information, since their length changes at runtime.
However, if you want to get the subset of variables that do have known sizes, you can use their type information. Each DW_TAG_*_type
may or may not have DW_AT_byte_size
or DW_AT_bit_size
, which indicate the size of the type in bytes/bits, including padding. If a compound type (i.e. one that isn't a primitive, like int
) doesn't have an explicit size, you'll need to walk its members and collect their sizes individually, as well as calculate padding.
- How can I know the struct name of one variable, as you know in one struct there maybe some variables, but currently I can get struct name and variable name seperately, but don't know the relationship between struct and variable which I get.
Variables are represented with DW_TAG_local_variable
and similar nodes, while types are represented with DW_TAG_*_type
nodes. Both may have DW_AT_name
attributes which contain their respective names. E.g., for the following:
struct foo bar;
bar
would be a DW_TAG_local_variable
with DW_AT_name=bar
and a DW_AT_type
that references foo
. foo
, in turn, would be a DW_TAG_structure_type
with DW_AT_name=foo
.
Note that neither variables nor types are required to have names -- the compiler is free to generate anonymous variables during compilation, and anonymous types (e.g. unnamed structs) are part of the C/C++ specification. You should make sure that you handle those cases.
Hope that helps!
Also be aware that a variables declared (eg in a .h
file) separately from their instantiation (in .c
) may have the name on the declaration DIE and a reference to the specification DIE (DW_AT_specification
, also consider DW_AT_abstract_origin
).
Yep! You'll need to handle DW_AT_specification
when working between definitions and declarations, and DW_AT_abstract_origin
when working with variables and parameters that have been inlined into another scope.
@mmclwd, is this still an issue?
Hi to everyone, I've stumbled upon this while searching for the exact issue title. The suggestion here have been useful, but still I think a code sample can be useful to anyone scrolling through here.
import sys
from elftools.dwarf.die import DIE
from monkeyPatch import getDWARFInfoPatched
encodingMap = {
1: 'ADDRESS',
2: 'BOOLEAN',
4: 'FLOAT',
5: 'SIGNED',
6: 'SIGNED CHAR',
7: 'UNSIGNED',
8: 'UNSIGNED CHAR'
}
# TYPE
# TODO should handle also arrays and structs
def findBaseType(die: DIE) -> None:
iterDIE = die
while iterDIE.tag != "DW_TAG_base_type":
if 'DW_AT_type' not in iterDIE.attributes:
return None
iterDIE = iterDIE.get_DIE_from_attribute("DW_AT_type")
return (
iterDIE.attributes["DW_AT_byte_size"].raw_value,
encodingMap[
iterDIE.attributes["DW_AT_encoding"].raw_value
], # TODO handle unrecognized encodings
iterDIE.attributes["DW_AT_name"].raw_value.decode('utf-8'))
# MAIN
if __name__ == "__main__":
if len(sys.argv) == 1:
print("ERROR: no input filename provided")
# Read file
elfFile = getDWARFInfoPatched(sys.argv[1])
dwarfinfo = elfFile.get_dwarf_info()
# Parse and build result object
for cu in dwarfinfo.iter_CUs():
# New CU
for die in cu.iter_DIEs():
# New VARIABILE
if die.tag == 'DW_TAG_variable':
# TYPE
tr: TypeError = findBaseType(die)
if tr is not None:
print(f"{die.get_full_path()}\tByteSize: {tr[0]}\tTypeName:{tr[1]}\t\tEncoding:{tr[2]}")
I'm open to suggestion and if the maintainers appreciate, I'm happy to commit a PR adding a more nicely crafted example (python is not my mother tongue so I may not be that elegant)
The logic of datatype recovery is tricky if done right. Arrays and structs can be arbitrarily nested.
Were we to provide an example, it would be either woefully incorrect (would break down on complicated datatypes), or too complex to be useful as an example.
Yeah I generally agree after realizing today that enum
is managed differently from a standard variable, even if we are dealing with a basic int
after all.
Never though of an array of structs either, panic ensues
@DanielTian90 There is a limited piece of datatype recovery in the library now, see describe_cpp_datatype()
in elftools.dwarf.datatype_cpp
.
@sevaa so how do we get the correct datatype of each and every variable which is inside a structure , array , as I want the datatype of the variables which are inside the structure too.
You call elftools.dwarf.datatype_cpp.describe_cpp_datatype()
, passing a DIE object with either DW_TAG_variable
or DW_TAG_member
.
@sevaa thanks for that but that method was a effective as I was associating the DW_AT_type with different until I reaches the DW_AT_base_type and here I was getting the right for all the pointers ,enum , variables and even if they are present inside a strcut. But I just have one more question how would we get the address out of each variable we know the address of the global one and then using how many bytes it occupies we add them until we reach the desired variable and we can do that manually is there a way to get that address out via a python code basically as I have the datatype information and how much byte size is it has and we can find it via that way but that would be tricky is there another way to get the address out any variable directly .
how would we get the address out of each variable
Read up about DW_AT_location
. For global/static variables, it will be a single expression with an address.
As always, before writing Python, feel free to take a look at what's in DWARF via DWARF Explorer ( https://github.com/sevaa/dwex ).