asterius
asterius copied to clipboard
[WIP] Revamp linker loading (do not load all object files during linking)
Closes https://github.com/tweag/asterius/issues/665. This PR is a WIP.
Roadmap for addressing the first bullet of https://github.com/tweag/asterius/issues/665:
- [X] Define new type for
dependencyMap
s, and serialization/deserialization logic for it. - [ ] Add an index in
AsteriusCachedModule
after thedependencyMap
field, as a map fromEntitySymbol
to both the file that the entity originates from and the offset in the file. Though initially the offset is enough, when we concatenate modules we end up having entities from different files so we need both the file and the offset in the file to pinpoint the location of the entity. - [ ] Implement retrieval of entities without reading any other part of the file.
Roadmap for addressing the second bullet of https://github.com/tweag/asterius/issues/665:
- [ ] TBD
In #665, the "dependency map" is just an abstract concept, it doesn't have to be a concrete data type. Adding a separate data type here (which is just another wrapper of SymbolMap
) is a bit overwork IMO.
You might be right, but I cannot be sure about how much we'll end up passing it around. If it is a lot, I'd say it deserves it's own type. Let's see how it goes and if it ends up being just noise I'll remove it before we merge this :slightly_smiling_face:
@TerrorJack lest we forget:
Since we need (at least parts of) the FFIMarshalState
to build the set of root symbols (see https://github.com/tweag/asterius/blob/master/asterius/src/Asterius/Passes/GCSections.hs#L43), having some parts of the FFIMarshalState
in AsteriusCachedModule
looks unavoidable.
After discussing this with @TerrorJack, the current plan is to follow a different approach than the one originally discussed in https://github.com/tweag/asterius/issues/665. We avoid the hassle of adding an index inside the object files and instead:
- Make object files contain (a) the
dependencyMap
and (b) the actualAsteriusModule
. BeforegcSections
we only deserialize the dependency map, and aftergcSections
we deserialize the asterius module. This means that both should be length-prefixed so that we can skip them. - After
gcSections
has determined which symbols we need to keep, we do indeed read all object and archive files that we need to read, but on-the-fly keep the parts that we need. Probably this means some extra work for the garbage collector, but it still avoids keeping all the data in memory.