goblin
goblin copied to clipboard
Reading the ELF of the current executable without copying it into memory a second time.
Hi,
This is more of a question that a bug.
I have a somewhat unusual use-case. I want to inspect the symbol table of the currently running executable.
The goblin example does this:
let buffer = fs::read(path)?;
which I think loads the entire binary into memory. In the case of the binary being the same as the one currently running, that means we load most of the stuff in a second time.
Is there a more efficient way? If I were to mmap() the binary into memory, would most kernels be smart enough to share the pages with those already loaded?
Thanks
Well you don't have to fs::read; goblin::Object::parse takes a &[u8]; so e.g., if you're on linux you could probably memmap the binary in /proc/ or whatever, and it might share the page? Would need testing. What is your usecase exactly? Probably reading it twice is of minor importance for most people.
If you need access to specific segments of an already mem-mapped binary, there are also unsafe goblin api to read and do stuff a dynamic linker, or what have you, might do.
Let me know if that answers your question, or you have something else in mind :)
E.g., you could use this to zero-copy read the symbols if you can get a pointer to the symbol table + it's len: https://github.com//m4b/goblin/blob/d215c61565bd31859bdb9e395302f49af03aab99/src/elf/sym.rs#L239
E.g., you could use this to zero-copy read the symbols if you can get a pointer to the symbol table
That's doable as long as I can make the .symtab section loadable. By default it isn't. I think I'd have to use objcopy to generate a modified binary.
you could probably memmap the binary in /proc/ or whatever, and it might share the page?
That's what I was hoping.
What is your usecase exactly?
My use case is pretty niche. We are writing a tracing just-in-time compiler for Rust programs and need to find the addresses of functions at runtime so that we can emit call instructions to them.
I could just use dwarf...
You would still need to load the file somehow if you're using dwarf.
If I were to mmap() the binary into memory, would most kernels be smart enough to share the pages with those already loaded?
Probably, and I don't see any better options.
On linux, you could read /proc/self/maps to get the addresses where your binary is already mapped and unsafe-cast those addresses to [u8] arrays of the length of the map. On Windows, there should also be a way to get a processes' memory map.
The mach-o parser seems to be missing the equivalent from_raw methods. For the loaded binary I'm trying to inspect, I'm only given a pointer to the mach-o header and a file path, but reading the size from the path is a nasty method as the file could change/be deleted in the mean time. Being able to load a header or complete object from a pointer would be a much better solution in this case.