patchelf icon indicating copy to clipboard operation
patchelf copied to clipboard

Implement an alternative layout engine #468

Open brenoguim opened this issue 2 years ago • 0 comments

I'm marking this as a draft PR because the main goal is to provide basis to think and discuss. The change is too agressive and so it's wiser to discuss before continuing to invest time in this approach. In this description I'll talk about what is being done here, the pros and cons, known issues (and known bugs it fixes) and then I'll go into the details of the patch.

This patch create a separate elf-independent class that is responsible to decide where to place sections after resizing. The goal is to create a class that is easily testable, meaning that we can manually write inputs to it and verify the output. I did not get into the part of adding these tests that would be the highlight of the patch because I wanted to share what I have so far.

Because this class is elf-independent, you can expect to find the following steps in the connection to Patchelf:

  1. Generate the elf-independent layout: elf2layout function
  2. Tell the LayoutEngine which sections must be resized: LayoutEngine::resize
  3. Trigger the LayoutEngine to calculate the new layout: LayoutEngine::updateFileLayout
  4. Read the layout and apply it to Patchelf current datastructures (like phdrs and shdrs): layout2elf

The LayoutEngine methods will always return false should any step fail. That allow us to fallback to the current code if the new code can't properly layout something.

Because of this, this patch introduces two switches: tryLayoutEngine which will trigger the new engine and fallback if it fails, and onlyLayoutEngine which will error out if the new engine fails. The regression tests are passing with onlyLayoutEngine.

The scary part is that all tests are passing even though some things are not implemented:

  1. normalizeNoteSegments is not called, so some note segments could get out of sync
  2. Synchronization of .note.gnu.property and .MIPS.abiflags from writeReplacedSections are not called
  3. There is no difference between executable or shared library. The new segments are always added in the end.
  4. binutilsQuirkPadding is not added

Now on the flip side:

  1. The new engine introduces new segments much less frequently because it first attempts to find whatever hole we have in the address space
  2. If it can't find a hole, it introduces one LOAD segment with the same access rights (RWE) as the segment from which the section was removed.

I changed the CI to invoke tests with both current layout engines and the new one


If you made it until here, I'll describe a bit of the idea behind the LayoutEngine. The interface exposes the following classes and methods:

Section: This is anything occupies space in the file. So, differently from ELF, the header, the program header table, and the section header table are also considered sections. Aside from the obvious fields name, type, access and align, it also has a pinned field to indicate that this section should not be moved in the virtual address space.

Segment: This is a load segment. The engine deals only with segments that make up the address space.

Layout: A group of Sections and Segments.

LayoutEngine: This uses the PImpl idiom to hide all implementation details from the user and expose truly only the needed methods: constructor, resize, updateFileLayout, layout and getVirtualAddress.

The best way to see how these objects are used is to look at the methods that create the layout from elf and the ones that read the layout to update the elf structures.


The LayoutEngine implementation is based on the following idea:

  1. First it reads the input Sections and Segments to build a datastructure that represents the virtual address space.
  2. Then the transformations are made to this virtual address space structure
  3. And finally, when updateFileLayout is called, the code iterates on the virtual address space structure and update Section and Segment fields.

The virtual address space structure is just a vector of VSegment which can be thought of a segment that was mapped into virtual memory, and each VSegment contains a vector of VSections which are the sections that were loaded by the VSegment.

The whole idea is to focus on the virtual address space because that is what imposes constraints, while file layout is just a byproduct.

brenoguim avatar Mar 06 '23 23:03 brenoguim