pyelftools icon indicating copy to clipboard operation
pyelftools copied to clipboard

Bugfix parse_cpp_datatype not stripping volatile tag

Open Henri-J-Norden opened this issue 2 years ago • 9 comments

Henri-J-Norden avatar Sep 27 '23 19:09 Henri-J-Norden

Makes sense. The test corpus was limited.

sevaa avatar Sep 27 '23 19:09 sevaa

I forgot to also add it to cpp_symbols dict for the PR, but I've now noticed that actually places volatile after the variable type... I believe this is technically valid for simple types (I'm using it for C code), but a larger change is probably needed to properly support all cases. From skimming cv (const and volatile) type qualifiers, it seems volatile is also valid wherever const is valid - and there is some special handling of const in TypeDesc.str, which should probably be expanded to deal with either/both. Maybe I should just make an issue instead?

Henri-J-Norden avatar Sep 27 '23 20:09 Henri-J-Norden

I'll have to do some language layering here. Is "pointer to volatile" a thing?

EDIT: it is. int volatile *p is a valid declaration.

sevaa avatar Sep 28 '23 13:09 sevaa

@Henri-J-Norden do you have small reproducers that demonstrate this? The ideal would be a simple C file we could compile and observe this on (using pyelftools)

eliben avatar Oct 16 '23 13:10 eliben

@eliben Here's code with some (not necessarily reasonable, but certainly compileable) examples, and the elf binary compiled with gcc -O0 -g3 using GCC 11.4.0

const int c;
const int* c_p;
int* const p_c;

// const->array->const->int
const int c_2[2];
// array->pointer->const->int
const int* c_p_2[2];
// const->array->const->pointer->int
int* const p_c_2[2];
// const->array->const->pointer->const->int
const int* const c_p_c_2[2];

// volatile->int
volatile int v;
// const->volatile->int
const volatile int cv;
volatile const int vc;
// pointer->const->volatile->int
const volatile int *cv_p;
volatile const int *vc_p;
// volatile->const->pointer->int
int *const volatile p_cv;
int *volatile const p_vc;
// volatile->const->pointer->const->volatile->int
const volatile int *const volatile cv_p_cv;
const volatile int *volatile const cv_p_vc;
volatile const int *const volatile vc_p_cv;
volatile const int *volatile const vc_p_vc;

// volatile->const->pointer->volatile->const->pointer->const->volatile->int
volatile const int *volatile const *volatile const vc_p_vc_p_vc;

// volatile->const->array->const->volatile->int
volatile const int vc_2[2];
// volatile->const->array->volatile->const->pointer->const->volatile->int
volatile const int *const volatile vc_p_cv_2[2];

int main(void)
{
	return 0;
}

volatile.zip

I added a script for debugging the types in https://github.com/Henri-J-Norden/pyelftools/blob/7fd850b21efaf999654e6880d63f8aff1bfa1258/examples/dwarf_var_types.py It seems const int c_2[2] also gets incorrectly parsed as const const int[2]. Here's the full output:

examples\dwarf_var_types.py --test ../../a.out 
Processing file: ../../a.out
  Found a compile unit at offset 0, length 721
    Top DIE with tag=DW_TAG_compile_unit
    name=/mnt/c/debug/volatile.c
      > name=b'c' type='const int'
        DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
          DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'c_p' type='const int *'
        DIE tag=DW_TAG_pointer_type type='const int *' modifiers=('const', 'pointer')
          DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
            DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'p_c' type='int *const'
        DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
          DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
            DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'c_2' type='const const int[2]'
        DIE tag=DW_TAG_const_type type='const const int[2]' modifiers=('const',)
          DIE tag=DW_TAG_array_type type='const int[2]' modifiers=()
            DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'c_p_2' type='const int *[2]'
        DIE tag=DW_TAG_array_type type='const int *[2]' modifiers=()
          DIE tag=DW_TAG_pointer_type type='const int *' modifiers=('const', 'pointer')
            DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'p_c_2' type='const int *const[2]'
        DIE tag=DW_TAG_const_type type='const int *const[2]' modifiers=('const',)
          DIE tag=DW_TAG_array_type type='int *const[2]' modifiers=()
            DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
              DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
                DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'c_p_c_2' type='const const int *const[2]'
        DIE tag=DW_TAG_const_type type='const const int *const[2]' modifiers=('const',)
          DIE tag=DW_TAG_array_type type='const int *const[2]' modifiers=()
            DIE tag=DW_TAG_const_type type='const int *const' modifiers=('const', 'pointer', 'const')
              DIE tag=DW_TAG_pointer_type type='const int *' modifiers=('const', 'pointer')
                DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'v' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'cv' type='const volatile '
        DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
          DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
            DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc' type='const volatile '
        DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
          DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
            DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'cv_p' type='const volatile  *'
        DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
          DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
            DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_p' type='const volatile  *'
        DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
          DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
            DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'p_cv' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'p_vc' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
              DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'cv_p_cv' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
              DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'cv_p_vc' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
              DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_p_cv' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
              DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_p_vc' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
              DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_p_vc_p_vc' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='volatile  *const' modifiers=('pointer', 'const')
            DIE tag=DW_TAG_pointer_type type='volatile  *' modifiers=('pointer',)
              DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
                  DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
                    DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                      DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                        DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_2' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const const volatile [2]' modifiers=('const',)
            DIE tag=DW_TAG_array_type type='const volatile [2]' modifiers=()
              DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                  DIE tag=DW_TAG_base_type type='int' modifiers=()
      > name=b'vc_p_cv_2' type='volatile '
        DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
          DIE tag=DW_TAG_const_type type='const volatile [2]' modifiers=('const',)
            DIE tag=DW_TAG_array_type type='volatile [2]' modifiers=()
              DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                DIE tag=DW_TAG_const_type type='const volatile  *const' modifiers=('const', 'pointer', 'const')
                  DIE tag=DW_TAG_pointer_type type='const volatile  *' modifiers=('const', 'pointer')
                    DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
                      DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
                        DIE tag=DW_TAG_base_type type='int' modifiers=()

And I guess it's also possible to have C++ references in the mix?

Henri-J-Norden avatar Oct 25 '23 16:10 Henri-J-Norden

I always felt uneasy about datatype recovery. We needed it because it was the cost of the llvm-dwarfdump autotest, but I suspect that the general case of it is too hairy a problem, and our implementation, whatever it is like, will end up mimicking the quirks of llvm-dwarfdump (which may evolve as the tool evolves).

Point is, add your binary to the the dwarfdump autotest directory, run the autotest, and see where it lands.

sevaa avatar Oct 25 '23 19:10 sevaa

I actually only looked at using parse_cpp_datatype in order to figure out the base type and array dimensions - so personally I don't even need full C-style datatype recovery. Maybe it would make sense to separate these functions? In the end, I had to re-implement finding the base type anyway, since the name of the TypeDesc object always includes the modifiers (that I don't care about).

As for the test, it fails at the first, simplest variable:

Test file 'test/testfiles_for_dwarfdump/volatile.elf'
.......................FAIL
....for file test/testfiles_for_dwarfdump/volatile.elf
....for option "--debug-info"
....Output #1 is llvm-dwarfdump, Output #2 is pyelftools
@@ Mismatch on line #29:
>>                dw_at_type [dw_form_ref4]	(cu + 0x0051 => {0x00000051} "volatile int")<<
>>              dw_at_type [dw_form_ref4]	(cu + 0x0051 => {0x00000051} "volatile ")<<

Henri-J-Norden avatar Oct 27 '23 17:10 Henri-J-Norden

If the subset of datatype recovery logic that you need is different from what the parse_cpp_datatype machinery provides, maybe you should keep it in your own logic. I, for one, have an alternative datatype recovery piece (which can be seen at https://github.com/sevaa/dwex/blob/master/dwex/locals.py), which started off as the same codebase as parse_cpp_datatype but then diverged, and I have no ambition to roll it into the library - specifically because datatype recovery is super hairy and hard to get right, and attempting to do so in a public codebase would imply a commitment to correctness that I'm not sure I can uphold.

sevaa avatar Oct 28 '23 15:10 sevaa

What I think we should do for the viability of the API, we should further decouple datatype discovery from C++ declaration string composition. This way, the former can evolve towards correctness (hopefully) while the latter can evolve towards matching what llvm-dwarfdump provides, and users can supply their own type+name generation logic, if needed.

If only there was a corpus of C++ sources aimed at spanning the gamut of datatype options (more generally, language features) out there...

For the short term, I'll try and see what can/should be done about volatile.

sevaa avatar Nov 08 '23 18:11 sevaa