pdb
pdb copied to clipboard
How to get unmangled function names?
Hi
I'm trying to get all the functions in a pdb file, their lengths, and their unmangled names (I believe the term used in pdbs might be "unique names") for the cargo-bloat tool.
This crate's ProcedureSymbol type does not have unmangled names. From what I've seen reading the LLVM docs on PDB files and using the llvm-pdbutil, they're not actually included in symbol records. Is there a recommended/reliable way of getting unmangled names? Right now what I'm doing is first collecting all PublicSymbols and then trying to find a matching public symbol. But, at least for rustc/cargo generated PDBs, this seems to miss a lot of functions that have ProcedureSymbol records and do not have a matching PublicSymbol record.
Is this approach fine, and I should try to find a way/file an issue with rust to try to get it to generate better PDBs, or is there some other way I can already use this crate to get these unmangled names, or is there something that can be added to this crate?
The code I'm using follows
use pdb::FallibleIterator;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let dir = std::path::Path::new("D:\\your\\path\\to\\pdb\\folder");
let file_name = "cargo-bloat";
let exe_path = dir.join(file_name).with_extension("exe");
let exe_size = std::fs::metadata(&exe_path)?.len();
let (_, text_size) = binfarce::pe::parse(&std::fs::read(&exe_path).unwrap())?.symbols()?;
let pdb_path = dir.join(file_name.replace("-", "_")).with_extension("pdb");
let file = std::fs::File::open(&pdb_path)?;
let mut pdb = pdb::PDB::open(file)?;
let dbi = pdb.debug_information()?;
let symbol_table = pdb.global_symbols()?;
let mut total_parsed_size = 0usize;
let mut demangled_total_parsed_size = 0usize;
let mut out_symbols = vec![];
// Collect the PublicSymbols
let mut public_symbols = vec![];
let mut symbols = symbol_table.iter();
while let Ok(Some(symbol)) = symbols.next() {
match symbol.parse() {
Ok(pdb::SymbolData::Public(data)) => {
if data.code || data.function {
public_symbols.push((data.offset, data.name.to_string().into_owned()));
}
if data.name.to_string().contains("try_small_punycode_decode") {
dbg!(&data);
}
}
_ => {}
}
}
let mut modules = dbi.modules()?;
while let Some(module) = modules.next()? {
let info = match pdb.module_info(&module)? {
Some(info) => info,
None => continue,
};
let mut symbols = info.symbols()?;
while let Some(symbol) = symbols.next()? {
if let Ok(pdb::SymbolData::Public(data)) = symbol.parse() {
if data.code || data.function {
public_symbols.push((data.offset, data.name.to_string().into_owned()));
}
if data.name.to_string().contains("try_small_punycode_decode") {
dbg!(&data);
}
}
}
}
let cmp_offsets = |a: &pdb::PdbInternalSectionOffset, b: &pdb::PdbInternalSectionOffset| {
a.section.cmp(&b.section).then(a.offset.cmp(&b.offset))
};
public_symbols.sort_unstable_by(|a, b| cmp_offsets(&a.0, &b.0));
// Now find the Procedure symbols in all modules
// and if possible the matching PublicSymbol record with the mangled name
let mut handle_proc = |proc: pdb::ProcedureSymbol| {
let mangled_symbol = public_symbols
.binary_search_by(|probe| {
let low = cmp_offsets(&probe.0, &proc.offset);
let high = cmp_offsets(&probe.0, &(proc.offset + proc.len));
use std::cmp::Ordering::*;
match (low, high) {
// Less than the low bound -> less
(Less, _) => Less,
// More than the high bound -> greater
(_, Greater) => Greater,
_ => Equal,
}
})
.ok()
.map(|x| &public_symbols[x]);
// Uncomment to verify binary search isn't screwing up anything
/*
let mangled_symbol = public_symbols
.iter()
.filter(|probe| probe.0 >= proc.offset && probe.0 <= (proc.offset + proc.len))
.take(1)
.next();
*/
let demangled_name = proc.name.to_string().into_owned();
out_symbols.push((proc.len as usize, demangled_name, mangled_symbol));
total_parsed_size += proc.len as usize;
if mangled_symbol.is_some() {
demangled_total_parsed_size += proc.len as usize;
}
};
let mut symbols = symbol_table.iter();
while let Ok(Some(symbol)) = symbols.next() {
if let Ok(pdb::SymbolData::Procedure(proc)) = symbol.parse() {
handle_proc(proc);
}
}
let mut modules = dbi.modules()?;
while let Some(module) = modules.next()? {
let info = match pdb.module_info(&module)? {
Some(info) => info,
None => continue,
};
let mut symbols = info.symbols()?;
while let Some(symbol) = symbols.next()? {
if let Ok(pdb::SymbolData::Procedure(proc)) = symbol.parse() {
handle_proc(proc);
}
}
}
println!(
"exe size:{}\ntext size:{}\nsize of fns found: {}\nratio:{}\nsize of fns with mangles found: {}\nratio:{}",
exe_size,
text_size,
total_parsed_size,
total_parsed_size as f32 / text_size as f32,
demangled_total_parsed_size,
demangled_total_parsed_size as f32 / text_size as f32
);
Ok(())
}```
By "unmangled" you mean "not mangled" I presume? In PDB the situation is a bit odd as inline and non inline symbols are quite different. You can look at what symbolic does. For inlines we're resolving this around here: https://github.com/getsentry/symbolic/blob/c03080a1d75bf66bcbee6b2a9c9df84266d7a581/symbolic-debuginfo/src/pdb.rs#L1052-L1057
For actually mangled names, we demangle on the fly later as these are known to massively blow up in size: https://github.com/getsentry/symbolic/blob/50a4d2eff93a4b529bd5120c47924dcbc8a4275c/symbolic-demangle/src/lib.rs#L163-L178 (uses msvc_demangler).
To demangle "decorated" global symbols, use msvc_demangler.
To emit function arguments for procedures, use pdb_addr2line::TypeFormatter::format_function.
To emit namespaces and function arguments for inlines, use pdb_addr2line::TypeFormatter::format_id.
To get function names for code addresses, use pdb_addr2line::Context::find_frames.
Yes, I meant "not mangled". I presume msvc_demangler is only useful if working with C/C++ symbols generated by msvc (Or a compatible compiler like clang-cl)? I was working with a rustc-generated PDB when I opened this, in which case I don't think the symbols in question use that mangling scheme.
My problem was I did not know if a given PDB was using the V0 or legacy rust mangling schemes, so I couldn't really reliably demangle. I was hoping the PDB itself contained the undecorated/not-mangled names, but that doesn't seem to be the case. So I just assumed the V0 rust mangling scheme and demangled them, which should work fine the vast majority of the time.
Closing this, thanks for the help.