Expose PDB70 signature as a UUID? (Disposition: Yes)
I wrote some code to read the PDB70 info out of a PE using goblin, to implement the equivalent of symstore.exe. I found that CodeviewPDB70DebugInfo::signature wasn't really that useful as raw bytes, since it's intended to be a GUID in little-endian byte order. I wound up pulling in byteorder and writing a little function like:
fn sig_to_uuid(sig: &[u8; 16]) -> Result<Uuid, Error> {
let mut rdr = Cursor::new(sig);
Ok(Uuid::from_fields(rdr.read_u32::<LittleEndian>()?,
rdr.read_u16::<LittleEndian>()?,
rdr.read_u16::<LittleEndian>()?,
&sig[8..])?)
}
...but it seems likely that anyone touching this data would need the same thing. Since you're already using scroll here it ought to be trivial to do this. I don't think the uuid crate is a particularly big dependency (and it's no-std by default, you have to enable the use_std feature explicitly).
Oh, I forgot to mention, but we have a little C++ utility in the Firefox codebase that does this exact thing, and it just memmaps the PE file and casts structs around willy-nilly, but it does use GUID direct from the Windows SDK headers:
https://dxr.mozilla.org/mozilla-central/rev/35dfa0882568c06f2d81b6749179815cd4f82886/testing/tools/fileid/win_fileid.cpp#12
I'm not opposed to adding this really, but I do sort of have an unstated policy in goblin to keep the deps as ~low~ small as possible. However, I am definitely a fan of strong typing, and since uuid crate is almost now apart of "std" rust, I suppose it's ok.
I think mach-o also has some uuid's in load commands, so might be a good idea to add it there too?
@willglynn what do you think?
Does anyone know if ELF uses uuid's anywhere? (and whether its standard and not some sparc workstation dec alpha 5003x machine extension?)
@luser if you do a PR it'll probably get merged, just need to make sure the no_std works correctly. @philipc is now the expert on this ;)
@m4b I'm entirely on board :+1: pdb depends on uuid, and matching those PE UUIDs with corresponding PDB UUIDs is the point of that field. uuid everywhere makes that use case simpler.
Your memory is correct, Mach-O also has UUIDs:
$ otool -l `which otool`
/usr/bin/otool:
…
Load command 8
cmd LC_UUID
cmdsize 24
uuid 6B978130-4223-3AF8-9C74-DCA078F07483
…
We should update LC_UUID handling appropriately, with extra attention for BE binaries since I don't know how those work.
@m4b Oh yeah, I know gnu ld emits UUIDs in ELF note sections when --build-id=uuid. No idea how common those are in practice, but I once needed them in the past (hence #49).
Humorously, I already wrote some code in object to handle LC_UUID:
https://github.com/gimli-rs/object/blob/0fb54d82c755e317b96d4b0d57714028d05ad95c/src/macho.rs#L217
I'm not sure ELF Build IDs can be usefully be handled as UUIDs, the actual contents of the note field don't preserve the semantics--it's just bytes.
@luser: Good point. --build-id=uuid is a 128-bit UUID, while --build-id=md5 is a 128-bit MD5, and I don't think those are distinguishable after the fact. Sounds like we can skip ELF UUIDs.
So I started implementing this, and got hung up a bit on the LC_UUID part because UuidCommand derives Pread et. al. so we can't use uuid::Uuid directly there. I fell down the rabbit hole and wrapped it in a newtype and manually implemented all the necessary traits, but I don't love it.
Here's the simple commit to use uuid::Uuid for CodeviewPDB70DebugInfo::signature: https://github.com/luser/goblin/commit/a24a84614b2d84a1bd60af55e544c622cf2393aa
Here's the additional commit with the newtype: https://github.com/luser/goblin/commit/b541e9584322c7e9c8e04b89db6d9be14d77e5c0
(I realize that I intended to switch the code from the first patch to use the newtype after I wrote that, but forgot.)
What do you think is the best way forward here? Serde works around this problem by having this whole complicated "remote derive" system.
The uuid crate does have a serde feature that derives the serde traits, FWIW.
Actually I’m not completely opposed to adding your UUID new type impl as a direct impl in scroll itself. Then all of your problems will disappear. We could then do: bytes.pread::<Uuid> which obviously looks cool, and also showcases the power of scroll ;)
Alternatively can remove the derive impl from UuidCommand and manually implement, but then uuid logic will be sort of duplicated.
UUID is no std so adding to scroll shouldn’t be a problem I think? Only thing is the api design; this would be first external type in scroll, so I’m not sure, do people usually reexport that type or let others depend on it ? But then there can be version issues occasionally ? Also not sure where it would go module wise. Perhaps needs an external types mod like extra or ext? So many questions.
Dunno.
For now I think your current version is fine, except might be worth to do manual implementation because then accessing the uuid won’t be different (eg no newtype accessors).
Anyone watching this have an opinion, as usual feel free to chime in
Maybe we should ask if uuid would take a PR adding an optional dependency on scroll? That'd be consistent with how uuid presently interacts with serde, and it seems like it would avoid some degree of interdependency.
Edit: the initial serde PR didn't face much pushback.
I'm not sure ELF Build IDs can be usefully be handled as UUIDs, the actual contents of the note field don't preserve the semantics--it's just bytes.
For what it's worth we (sentry) go by the logic in breakpad for now and we effectively has this build note with the same hashing algorithm that breakpad uses to come up with something that looks more or less like a uuid (it's not actually forced to have a valid version though so it's effectively random bytes). However for debug purposes that uuid is not good for windows where you also need the name age of a PE file to locate it.
We're already using uuid as a dependency so at least as far as we are concerned, go ahead and add uuid as dependency to goblin :)
So:
- I would prefer if UUID crate would take on scroll as optional dependency
- We can then read out uuid's as a type properly using scroll
We can probably punt on this decision and avoid backwards incompatible issues by exposing a raw bytes api for any uuid-ish element, and then we can add proper uuid type in the future (without breaking anyone depending on the raw bytes for whatever reason)?
So, does anyone want to take the first step and see if uuid crate would be willing to have a scroll impl via a PR? Should be simple w.r.t. implementation
It would also be really useful to have a conversion function which prints out guid/age for PE files in the default microsoft formatting. Something along those lines would work:
let guidage = format!("{}{:X}", uuid.to_simple().to_string().to_uppercase(), age)