symbolic Incorrect CodeId casing for PE files

The CodeId of a PeObject can be used to obtain exe and dll files from symbol servers. Examples: https://renderdoc.org/symbols/renderdoc.dll/61015E74442b000/renderdoc.dl_ https://msdl.microsoft.com/download/symbols/ntoskrnl.exe/7A04BFB21046000/ntoskrnl.exe

Some symbol servers are case-sensitive. In the renderdoc example above, it has to be 61015E74442b000 (with lowercase b) rather than 61015E74442B000, otherwise the server returns a 404. For PE files, the code ID is constructed as follows: 61015E74 is the timestamp, in uppercase hex, and 442b000 is the image size, in lowercase hex.

See also https://randomascii.wordpress.com/2013/03/09/symbols-the-microsoft-way/:

The format for the path to a PE file in a symbol server share is:
“%s\%s\%08X%x\%s” % (serverName, peName, timeStamp, imageSize, peName)
Note that the timeStamp is always printed as eight digits (with leading zeroes as needed) using upper-case for ‘A’ to ‘F’ (important if your symbol server is case sensitive), whereas the image size is printed using as few digits as needed, in lower-case.

PeObject::code_id() currently builds an all-lowercase string: https://github.com/getsentry/symbolic/blob/9d308ef36bcd2c8d80d1d5dbc071ea6b5aa88d2e/symbolic-debuginfo/src/pe.rs#L101

And then CodeId::new calls make_ascii_lowercase(), just to be sure:

https://github.com/getsentry/rust-debugid/blob/9fb71ae2c1bd576efdae9c7856314459048da5d8/src/lib.rs#L300

        string.make_ascii_lowercase();

And then in dump_syms, we call to_uppercase():

https://github.com/mozilla/dump_syms/blob/7fc891c7ff44d7b4a62951f94f22729ec66d0622/src/windows/pdb.rs#L532

            .map(|pe| pe.code_id().unwrap().as_str().to_uppercase());

To make this work properly with case-sensitive symbol servers, it seems like we need to preserve the case everywhere and use {:08X}{:x} in PeObject::code_id().

Sep 28 '21 21:09 mstange

Or, alternatively, we could keep everything the way it is today, and anyone who wants to use CodeIds to interact with symbol servers will need to do some post processing on the CodeId, by uppercasing the first 8 digits and lowercasing the remainder.

Sep 29 '21 03:09 mstange

Interestingly, Breakpad's Windows dump_syms implementation includes this same bug: https://chromium.googlesource.com/breakpad/breakpad/+/bd4a28c08b9d97bc4910f527f3c3ae232fdc5747/src/common/windows/pe_util.cc#232

Dec 01 '21 15:12 luser

Or, alternatively, we could keep everything the way it is today, and anyone who wants to use CodeIds to interact with symbol servers will need to do some post processing on the CodeId, by uppercasing the first 8 digits and lowercasing the remainder.

It appears symbolicator has always been taking this approach: https://github.com/getsentry/symbolicator/blob/2fb9989e376059beeb017d77b100190ce6609c57/crates/symbolicator/src/utils/paths.rs#L86-L92

Feb 15 '22 13:02 flub

To me, this doesn't look like a bug - as long as there's no industry-wide standard that defines that a hexadecimal string representation must be in a given format. Yes, it may be useful to change this to match whatever some symbol servers expect, but it may also break something else. Not sure that's a good tradeoff.

Feb 17 '23 13:02 vaind