dump_syms icon indicating copy to clipboard operation
dump_syms copied to clipboard

Dump extern "C" functions without parameters in signature

Open luser opened this issue 10 years ago • 7 comments

For compatibility with Breakpad's dump_syms, we should figure out how to dump extern "C" functions without parameters in the function signature. I have an attempt at doing this in the check-linkage branch (https://github.com/luser/dump_syms/commit/71add3e663232fb68567607af93a116c31cfb1db), but it doesn't work properly. The PDB file I'm testing with has multiple records in the globals stream with the same address, and I don't know how to disambiguate them.

luser avatar Sep 26 '14 14:09 luser

Do you have an example of a PDB which has this problem?

jon-turney avatar Oct 31 '14 10:10 jon-turney

Yeah, I've been testing against a simple test program: http://people.mozilla.com/~tmielczarek/TestApp.pdb.gz

luser avatar Oct 31 '14 11:10 luser

That program has, for example:

extern "C" void testC(int arg1, short arg2)

and all the other test* functions are normal C++ linkage. I've been comparing against the output of the stock Breakpad dump_syms to try to match it.

luser avatar Oct 31 '14 11:10 luser

I rebased the check-linkage branch: https://github.com/luser/dump_syms/tree/check-linkage

luser avatar Oct 31 '14 11:10 luser

Hmm, yes, very strange.

$ ./dump_syms tests/TestApp.pdb 2>&1 | egrep "(wmain|test6|test7|testC)"
leaftype 110e, symbol type 2, address 000115a0 (offset 000005a0, segment 0002), name _wmain
leaftype 110e, symbol type 2, address 00011d00 (offset 00000d00, segment 0002), name _wmainCRTStartup
leaftype 110e, symbol type 2, address 000115a0 (offset 000005a0, segment 0002), name ?test6@@YAXMNO@Z
leaftype 110e, symbol type 2, address 00011bc0 (offset 00000bc0, segment 0002), name _wmain
leaftype 110e, symbol type 2, address 00011bc0 (offset 00000bc0, segment 0002), name ?test7@@YAXC_WPA_WPAPAD@Z
leaftype 110e, symbol type 2, address 00011bf0 (offset 00000bf0, segment 0002), name _wmain
leaftype 110e, symbol type 2, address 00011bf0 (offset 00000bf0, segment 0002), name _testC
leaftype 110e, symbol type 2, address 000124c0 (offset 000014c0, segment 0002), name _wmain
FUNC 115a0 25 14 test6(float,double,double)
FUNC 11bc0 25 10 test7(signed char,wchar_t,wchar_t *,char * *)
FUNC 11bf0 49 8 testC(int,short)
FUNC 11d00 f 0 wmainCRTStartup()
FUNC 124c0 46 8 wmain(int,wchar_t * *)

I'm not sure what it means that different symbols are apparently at the same address. Have test6, test7 and testC been inlined into wmain?

Anyhow, it seems that the assumption that symbols can be looked up by just address is invalid

I had an attempt at implementing looking them up by name instead, see 3bb706e6076c91777229c22e42aa65bbc38b9666, which seems to produce the right output for this test, and also addresses #11, but it needs to be improved to do the name lookup in a sensible way.

$ ./dump_syms tests/TestApp.pdb | egrep "(wmain|test6|test7|testC)"
FUNC 115a0 25 14 test6(float,double,double)
FUNC 11bc0 25 10 test7(signed char,wchar_t,wchar_t *,char * *)
FUNC 11bf0 49 8 testC
FUNC 11d00 f 0 wmainCRTStartup
FUNC 124c0 46 8 wmain

But now it occurs to me that this isn't right either, as the same function name could occur multiple times, mangled with different sets of parameters and also unmangled, so perhaps the lookup needs to be on both offset and function name

jon-turney avatar Nov 02 '14 18:11 jon-turney

I don't think they're inlined, if you look at the FUNC records (from either version of dump_syms) we get distinct addresses for testC and wmain (you can see in your output above). I just can't figure out how those correspond to the entries in the globals stream.

luser avatar Nov 03 '14 19:11 luser

Checking the set of PDBs from the MS symbol server that I have, I didn't find any other examples of this (duplicate symbols in the global symbol table with different addresses)

Looking at the data above, it seems a simple heuristic which would give the expected data would be to use the last definition of each symbol (so _wmain = 000124c0 and the other definitions are ignored), but it's hard to know if this is the correct way to interpet things.

jon-turney avatar Nov 09 '14 14:11 jon-turney