mold
mold copied to clipboard
No PLT mode?
Currently, mold always create a PLT entry if a call instruction refers an external library function. Doing this is considered the best practice because of the following two reasons.
- It consolidates places to relocate in one place. If we don't do this, the dynamic loader has to relocate every call instruction in the entire program that calls an external library function.
- It avoids relocating text segments. Relocating text segments is considered a bad idea because it breaks physical page sharing between processes running the same program. As long as they don't write to a text segment, that text segment can be shared between them, but if they rewrite it in an arbitrary way, the page cannot be shared anymore.
So, the current design is the result of a tradeoff, and that's not the only way to make an external function call possible.
What if we do not create PLT entries at all and let the dynamic loader directly relocate text segments? The result is the opposite of the above analysis, that is
- The dynamic loader has to relocate a lot of places, which would slows down process startup, and
- it breaks text segment sharing, but
- it eliminates the cost of PLT call.
This might be a good tradeoff for a long-running, server-type program which usually run only one process for each machine. Adding a "no PLT" mode to the linker and link such program with that mode might have a chance to improve the resulting program's performance.
Similar to https://github.com/rui314/mold/issues/465.
Do you mean write absolute addresses into executable code? That work fine on 32 bits but not well suited for 64 bit because instruction immediate values are usually 32 bit on 64 bit CPUs so larger code and more complicated relocation types will be needed.
It's for both 32 and 64 bits. I think we can still use GOT to access global variables, and to call a function, we can make a change to the compiler so that it emits code like this
movabs func_abs_address, %r11 # with R_X86_64_64 reloc
call *%r11
r11 is reserved for PLT, so we don't have to save a value in r11 for this code sequence. I'm not sure if CPU is smart enough to combine the two instructions so that it is effectively a direct jump though.