pyelftools icon indicating copy to clipboard operation
pyelftools copied to clipboard

How to get variable type

Open DanielTian90 opened this issue 4 years ago • 14 comments

From the demo, I can can global variable addr and size, but not found data type, like unit8 or int32 etc. So how to get type attributes with this lib? Many thanks!

DanielTian90 avatar Mar 30 '20 08:03 DanielTian90

Look at the DW_AT_type attribute. It's a reference to another DIE that describes the datatype. References are usually stored as either a CU-relative offset (DW_FORM_refX) or an absolute offset (DW_FORM_ref_addr).

The type DIE can contain further references, for example if the type is an array or a pointer.

You can use DWARF Explorer to explore that data structure interactively.

sevaa avatar Mar 30 '20 12:03 sevaa

Thanks for your kind reply.@eliben Follow what you said, I can get type of one variable. I have another 2 questions and need your help:

  1. How can I get length of one specific variable? eg. 4bytes 2bytes 1bytes
  2. How can I know the struct name of one variable, as you know in one struct there maybe some variables, but currently I can get struct name and variable name seperately, but don't know the relationship between struct and variable which I get.

DanielTian90 avatar Mar 31 '20 09:03 DanielTian90

  1. How can I get length of one specific variable? eg. 4bytes 2bytes 1bytes

Putting this upfront: not all variables have statically inferable sizes. VLAs, for example, won't have their length encoded in the DWARF information, since their length changes at runtime.

However, if you want to get the subset of variables that do have known sizes, you can use their type information. Each DW_TAG_*_type may or may not have DW_AT_byte_size or DW_AT_bit_size, which indicate the size of the type in bytes/bits, including padding. If a compound type (i.e. one that isn't a primitive, like int) doesn't have an explicit size, you'll need to walk its members and collect their sizes individually, as well as calculate padding.

  1. How can I know the struct name of one variable, as you know in one struct there maybe some variables, but currently I can get struct name and variable name seperately, but don't know the relationship between struct and variable which I get.

Variables are represented with DW_TAG_local_variable and similar nodes, while types are represented with DW_TAG_*_type nodes. Both may have DW_AT_name attributes which contain their respective names. E.g., for the following:

struct foo bar;

bar would be a DW_TAG_local_variable with DW_AT_name=bar and a DW_AT_type that references foo. foo, in turn, would be a DW_TAG_structure_type with DW_AT_name=foo.

Note that neither variables nor types are required to have names -- the compiler is free to generate anonymous variables during compilation, and anonymous types (e.g. unnamed structs) are part of the C/C++ specification. You should make sure that you handle those cases.

Hope that helps!

woodruffw avatar Mar 31 '20 14:03 woodruffw

Also be aware that a variables declared (eg in a .h file) separately from their instantiation (in .c) may have the name on the declaration DIE and a reference to the specification DIE (DW_AT_specification, also consider DW_AT_abstract_origin).

mdmillerii avatar Apr 09 '20 17:04 mdmillerii

Yep! You'll need to handle DW_AT_specification when working between definitions and declarations, and DW_AT_abstract_origin when working with variables and parameters that have been inlined into another scope.

woodruffw avatar Apr 10 '20 16:04 woodruffw

@mmclwd, is this still an issue?

sevaa avatar Jul 17 '20 18:07 sevaa

Hi to everyone, I've stumbled upon this while searching for the exact issue title. The suggestion here have been useful, but still I think a code sample can be useful to anyone scrolling through here.

import sys
from elftools.dwarf.die import DIE

from monkeyPatch import getDWARFInfoPatched

encodingMap = {
    1: 'ADDRESS',
    2: 'BOOLEAN',
    4: 'FLOAT',
    5: 'SIGNED',
    6: 'SIGNED CHAR', 
    7: 'UNSIGNED',
    8: 'UNSIGNED CHAR'
}


# TYPE
# TODO should handle also arrays and structs
def findBaseType(die: DIE) -> None:
    iterDIE = die
    while iterDIE.tag != "DW_TAG_base_type":
        if 'DW_AT_type' not in iterDIE.attributes:
            return None
        iterDIE = iterDIE.get_DIE_from_attribute("DW_AT_type")
    return (
        iterDIE.attributes["DW_AT_byte_size"].raw_value,
        encodingMap[
            iterDIE.attributes["DW_AT_encoding"].raw_value
        ],  # TODO handle unrecognized encodings
        iterDIE.attributes["DW_AT_name"].raw_value.decode('utf-8'))

# MAIN
if __name__ == "__main__":
    if len(sys.argv) == 1:
        print("ERROR: no input filename provided")
    # Read file
    elfFile = getDWARFInfoPatched(sys.argv[1])
    dwarfinfo = elfFile.get_dwarf_info()
    # Parse and build result object
    for cu in dwarfinfo.iter_CUs():
        # New CU
        for die in cu.iter_DIEs():
            # New VARIABILE
            if die.tag == 'DW_TAG_variable':
                # TYPE
                tr: TypeError = findBaseType(die)
                if tr is not None:
                    print(f"{die.get_full_path()}\tByteSize: {tr[0]}\tTypeName:{tr[1]}\t\tEncoding:{tr[2]}")

I'm open to suggestion and if the maintainers appreciate, I'm happy to commit a PR adding a more nicely crafted example (python is not my mother tongue so I may not be that elegant)

olimexsmart avatar Dec 09 '23 16:12 olimexsmart

The logic of datatype recovery is tricky if done right. Arrays and structs can be arbitrarily nested.

Were we to provide an example, it would be either woefully incorrect (would break down on complicated datatypes), or too complex to be useful as an example.

sevaa avatar Dec 12 '23 15:12 sevaa

Yeah I generally agree after realizing today that enum is managed differently from a standard variable, even if we are dealing with a basic int after all.

Never though of an array of structs either, panic ensues

olimexsmart avatar Dec 12 '23 21:12 olimexsmart

@DanielTian90 There is a limited piece of datatype recovery in the library now, see describe_cpp_datatype() in elftools.dwarf.datatype_cpp.

sevaa avatar Apr 17 '24 13:04 sevaa

@sevaa so how do we get the correct datatype of each and every variable which is inside a structure , array , as I want the datatype of the variables which are inside the structure too.

yash10052001 avatar May 18 '24 18:05 yash10052001

You call elftools.dwarf.datatype_cpp.describe_cpp_datatype(), passing a DIE object with either DW_TAG_variable or DW_TAG_member.

sevaa avatar May 19 '24 19:05 sevaa

@sevaa thanks for that but that method was a effective as I was associating the DW_AT_type with different until I reaches the DW_AT_base_type and here I was getting the right for all the pointers ,enum , variables and even if they are present inside a strcut. But I just have one more question how would we get the address out of each variable we know the address of the global one and then using how many bytes it occupies we add them until we reach the desired variable and we can do that manually is there a way to get that address out via a python code basically as I have the datatype information and how much byte size is it has and we can find it via that way but that would be tricky is there another way to get the address out any variable directly .

yash10052001 avatar Jun 02 '24 18:06 yash10052001

how would we get the address out of each variable

Read up about DW_AT_location. For global/static variables, it will be a single expression with an address.

As always, before writing Python, feel free to take a look at what's in DWARF via DWARF Explorer ( https://github.com/sevaa/dwex ).

sevaa avatar Jun 03 '24 17:06 sevaa