goblin icon indicating copy to clipboard operation
goblin copied to clipboard

Reading the ELF of the current executable without copying it into memory a second time.

Open vext01 opened this issue 5 years ago • 7 comments

Hi,

This is more of a question that a bug.

I have a somewhat unusual use-case. I want to inspect the symbol table of the currently running executable.

The goblin example does this:

            let buffer = fs::read(path)?;

which I think loads the entire binary into memory. In the case of the binary being the same as the one currently running, that means we load most of the stuff in a second time.

Is there a more efficient way? If I were to mmap() the binary into memory, would most kernels be smart enough to share the pages with those already loaded?

Thanks

vext01 avatar Apr 01 '20 15:04 vext01

Well you don't have to fs::read; goblin::Object::parse takes a &[u8]; so e.g., if you're on linux you could probably memmap the binary in /proc/ or whatever, and it might share the page? Would need testing. What is your usecase exactly? Probably reading it twice is of minor importance for most people.

If you need access to specific segments of an already mem-mapped binary, there are also unsafe goblin api to read and do stuff a dynamic linker, or what have you, might do.

Let me know if that answers your question, or you have something else in mind :)

m4b avatar Apr 02 '20 06:04 m4b

E.g., you could use this to zero-copy read the symbols if you can get a pointer to the symbol table + it's len: https://github.com//m4b/goblin/blob/d215c61565bd31859bdb9e395302f49af03aab99/src/elf/sym.rs#L239

m4b avatar Apr 02 '20 06:04 m4b

E.g., you could use this to zero-copy read the symbols if you can get a pointer to the symbol table

That's doable as long as I can make the .symtab section loadable. By default it isn't. I think I'd have to use objcopy to generate a modified binary.

you could probably memmap the binary in /proc/ or whatever, and it might share the page?

That's what I was hoping.

What is your usecase exactly?

My use case is pretty niche. We are writing a tracing just-in-time compiler for Rust programs and need to find the addresses of functions at runtime so that we can emit call instructions to them.

vext01 avatar Apr 02 '20 12:04 vext01

I could just use dwarf...

vext01 avatar Apr 02 '20 13:04 vext01

You would still need to load the file somehow if you're using dwarf.

If I were to mmap() the binary into memory, would most kernels be smart enough to share the pages with those already loaded?

Probably, and I don't see any better options.

philipc avatar Apr 03 '20 01:04 philipc

On linux, you could read /proc/self/maps to get the addresses where your binary is already mapped and unsafe-cast those addresses to [u8] arrays of the length of the map. On Windows, there should also be a way to get a processes' memory map.

csarn avatar May 05 '20 12:05 csarn

The mach-o parser seems to be missing the equivalent from_raw methods. For the loaded binary I'm trying to inspect, I'm only given a pointer to the mach-o header and a file path, but reading the size from the path is a nasty method as the file could change/be deleted in the mean time. Being able to load a header or complete object from a pointer would be a much better solution in this case.

melvyn2 avatar Jun 28 '23 20:06 melvyn2