asterius icon indicating copy to clipboard operation
asterius copied to clipboard

[WIP] Revamp linker loading (do not load all object files during linking)

Open gkaracha opened this issue 4 years ago • 4 comments

Closes https://github.com/tweag/asterius/issues/665. This PR is a WIP.

Roadmap for addressing the first bullet of https://github.com/tweag/asterius/issues/665:

  • [X] Define new type for dependencyMaps, and serialization/deserialization logic for it.
  • [ ] Add an index in AsteriusCachedModule after the dependencyMap field, as a map from EntitySymbol to both the file that the entity originates from and the offset in the file. Though initially the offset is enough, when we concatenate modules we end up having entities from different files so we need both the file and the offset in the file to pinpoint the location of the entity.
  • [ ] Implement retrieval of entities without reading any other part of the file.

Roadmap for addressing the second bullet of https://github.com/tweag/asterius/issues/665:

  • [ ] TBD

gkaracha avatar May 28 '20 11:05 gkaracha

In #665, the "dependency map" is just an abstract concept, it doesn't have to be a concrete data type. Adding a separate data type here (which is just another wrapper of SymbolMap) is a bit overwork IMO.

TerrorJack avatar May 28 '20 12:05 TerrorJack

You might be right, but I cannot be sure about how much we'll end up passing it around. If it is a lot, I'd say it deserves it's own type. Let's see how it goes and if it ends up being just noise I'll remove it before we merge this :slightly_smiling_face:

gkaracha avatar May 28 '20 12:05 gkaracha

@TerrorJack lest we forget:

Since we need (at least parts of) the FFIMarshalState to build the set of root symbols (see https://github.com/tweag/asterius/blob/master/asterius/src/Asterius/Passes/GCSections.hs#L43), having some parts of the FFIMarshalState in AsteriusCachedModule looks unavoidable.

gkaracha avatar May 29 '20 12:05 gkaracha

After discussing this with @TerrorJack, the current plan is to follow a different approach than the one originally discussed in https://github.com/tweag/asterius/issues/665. We avoid the hassle of adding an index inside the object files and instead:

  • Make object files contain (a) the dependencyMap and (b) the actual AsteriusModule. Before gcSections we only deserialize the dependency map, and after gcSections we deserialize the asterius module. This means that both should be length-prefixed so that we can skip them.
  • After gcSections has determined which symbols we need to keep, we do indeed read all object and archive files that we need to read, but on-the-fly keep the parts that we need. Probably this means some extra work for the garbage collector, but it still avoids keeping all the data in memory.

gkaracha avatar Jun 12 '20 09:06 gkaracha