Bugfix parse_cpp_datatype not stripping volatile tag
Makes sense. The test corpus was limited.
I forgot to also add it to cpp_symbols dict for the PR, but I've now noticed that actually places volatile after the variable type... I believe this is technically valid for simple types (I'm using it for C code), but a larger change is probably needed to properly support all cases. From skimming cv (const and volatile) type qualifiers, it seems volatile is also valid wherever const is valid - and there is some special handling of const in TypeDesc.str, which should probably be expanded to deal with either/both. Maybe I should just make an issue instead?
I'll have to do some language layering here. Is "pointer to volatile" a thing?
EDIT: it is. int volatile *p is a valid declaration.
@Henri-J-Norden do you have small reproducers that demonstrate this? The ideal would be a simple C file we could compile and observe this on (using pyelftools)
@eliben Here's code with some (not necessarily reasonable, but certainly compileable) examples, and the elf binary compiled with gcc -O0 -g3 using GCC 11.4.0
const int c;
const int* c_p;
int* const p_c;
// const->array->const->int
const int c_2[2];
// array->pointer->const->int
const int* c_p_2[2];
// const->array->const->pointer->int
int* const p_c_2[2];
// const->array->const->pointer->const->int
const int* const c_p_c_2[2];
// volatile->int
volatile int v;
// const->volatile->int
const volatile int cv;
volatile const int vc;
// pointer->const->volatile->int
const volatile int *cv_p;
volatile const int *vc_p;
// volatile->const->pointer->int
int *const volatile p_cv;
int *volatile const p_vc;
// volatile->const->pointer->const->volatile->int
const volatile int *const volatile cv_p_cv;
const volatile int *volatile const cv_p_vc;
volatile const int *const volatile vc_p_cv;
volatile const int *volatile const vc_p_vc;
// volatile->const->pointer->volatile->const->pointer->const->volatile->int
volatile const int *volatile const *volatile const vc_p_vc_p_vc;
// volatile->const->array->const->volatile->int
volatile const int vc_2[2];
// volatile->const->array->volatile->const->pointer->const->volatile->int
volatile const int *const volatile vc_p_cv_2[2];
int main(void)
{
return 0;
}
I added a script for debugging the types in https://github.com/Henri-J-Norden/pyelftools/blob/7fd850b21efaf999654e6880d63f8aff1bfa1258/examples/dwarf_var_types.py
It seems const int c_2[2] also gets incorrectly parsed as const const int[2]. Here's the full output:
examples\dwarf_var_types.py --test ../../a.out
Processing file: ../../a.out
Found a compile unit at offset 0, length 721
Top DIE with tag=DW_TAG_compile_unit
name=/mnt/c/debug/volatile.c
> name=b'c' type='const int'
DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'c_p' type='const int *'
DIE tag=DW_TAG_pointer_type type='const int *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'p_c' type='int *const'
DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'c_2' type='const const int[2]'
DIE tag=DW_TAG_const_type type='const const int[2]' modifiers=('const',)
DIE tag=DW_TAG_array_type type='const int[2]' modifiers=()
DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'c_p_2' type='const int *[2]'
DIE tag=DW_TAG_array_type type='const int *[2]' modifiers=()
DIE tag=DW_TAG_pointer_type type='const int *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'p_c_2' type='const int *const[2]'
DIE tag=DW_TAG_const_type type='const int *const[2]' modifiers=('const',)
DIE tag=DW_TAG_array_type type='int *const[2]' modifiers=()
DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'c_p_c_2' type='const const int *const[2]'
DIE tag=DW_TAG_const_type type='const const int *const[2]' modifiers=('const',)
DIE tag=DW_TAG_array_type type='const int *const[2]' modifiers=()
DIE tag=DW_TAG_const_type type='const int *const' modifiers=('const', 'pointer', 'const')
DIE tag=DW_TAG_pointer_type type='const int *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const int' modifiers=('const',)
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'v' type='volatile '
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'cv' type='const volatile '
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'vc' type='const volatile '
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'cv_p' type='const volatile *'
DIE tag=DW_TAG_pointer_type type='const volatile *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'vc_p' type='const volatile *'
DIE tag=DW_TAG_pointer_type type='const volatile *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'p_cv' type='volatile '
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'p_vc' type='volatile '
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='int *const' modifiers=('pointer', 'const')
DIE tag=DW_TAG_pointer_type type='int *' modifiers=('pointer',)
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'cv_p_cv' type='volatile '
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='const volatile *const' modifiers=('const', 'pointer', 'const')
DIE tag=DW_TAG_pointer_type type='const volatile *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'cv_p_vc' type='volatile '
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='const volatile *const' modifiers=('const', 'pointer', 'const')
DIE tag=DW_TAG_pointer_type type='const volatile *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'vc_p_cv' type='volatile '
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='const volatile *const' modifiers=('const', 'pointer', 'const')
DIE tag=DW_TAG_pointer_type type='const volatile *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'vc_p_vc' type='volatile '
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='const volatile *const' modifiers=('const', 'pointer', 'const')
DIE tag=DW_TAG_pointer_type type='const volatile *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'vc_p_vc_p_vc' type='volatile '
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='volatile *const' modifiers=('pointer', 'const')
DIE tag=DW_TAG_pointer_type type='volatile *' modifiers=('pointer',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='const volatile *const' modifiers=('const', 'pointer', 'const')
DIE tag=DW_TAG_pointer_type type='const volatile *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'vc_2' type='volatile '
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='const const volatile [2]' modifiers=('const',)
DIE tag=DW_TAG_array_type type='const volatile [2]' modifiers=()
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
> name=b'vc_p_cv_2' type='volatile '
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='const volatile [2]' modifiers=('const',)
DIE tag=DW_TAG_array_type type='volatile [2]' modifiers=()
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_const_type type='const volatile *const' modifiers=('const', 'pointer', 'const')
DIE tag=DW_TAG_pointer_type type='const volatile *' modifiers=('const', 'pointer')
DIE tag=DW_TAG_const_type type='const volatile ' modifiers=('const',)
DIE tag=DW_TAG_volatile_type type='volatile ' modifiers=()
DIE tag=DW_TAG_base_type type='int' modifiers=()
And I guess it's also possible to have C++ references in the mix?
I always felt uneasy about datatype recovery. We needed it because it was the cost of the llvm-dwarfdump autotest, but I suspect that the general case of it is too hairy a problem, and our implementation, whatever it is like, will end up mimicking the quirks of llvm-dwarfdump (which may evolve as the tool evolves).
Point is, add your binary to the the dwarfdump autotest directory, run the autotest, and see where it lands.
I actually only looked at using parse_cpp_datatype in order to figure out the base type and array dimensions - so personally I don't even need full C-style datatype recovery. Maybe it would make sense to separate these functions? In the end, I had to re-implement finding the base type anyway, since the name of the TypeDesc object always includes the modifiers (that I don't care about).
As for the test, it fails at the first, simplest variable:
Test file 'test/testfiles_for_dwarfdump/volatile.elf'
.......................FAIL
....for file test/testfiles_for_dwarfdump/volatile.elf
....for option "--debug-info"
....Output #1 is llvm-dwarfdump, Output #2 is pyelftools
@@ Mismatch on line #29:
>> dw_at_type [dw_form_ref4] (cu + 0x0051 => {0x00000051} "volatile int")<<
>> dw_at_type [dw_form_ref4] (cu + 0x0051 => {0x00000051} "volatile ")<<
If the subset of datatype recovery logic that you need is different from what the parse_cpp_datatype machinery provides, maybe you should keep it in your own logic. I, for one, have an alternative datatype recovery piece (which can be seen at https://github.com/sevaa/dwex/blob/master/dwex/locals.py), which started off as the same codebase as parse_cpp_datatype but then diverged, and I have no ambition to roll it into the library - specifically because datatype recovery is super hairy and hard to get right, and attempting to do so in a public codebase would imply a commitment to correctness that I'm not sure I can uphold.
What I think we should do for the viability of the API, we should further decouple datatype discovery from C++ declaration string composition. This way, the former can evolve towards correctness (hopefully) while the latter can evolve towards matching what llvm-dwarfdump provides, and users can supply their own type+name generation logic, if needed.
If only there was a corpus of C++ sources aimed at spanning the gamut of datatype options (more generally, language features) out there...
For the short term, I'll try and see what can/should be done about volatile.