Implement External STG
Is your feature request related to a problem? Please describe. This issue proposes the "external STG" feature. More specifically:
- The object format should contain enough information to reconstruct the
StgSynAST at link-time. - The reconstructed
StgSynAST can be compiled to Cmm and then WebAssembly. The resulting linked program works with the current runtime.
As long as the current Cmm-based codegen/linker/runtime doesn't break, implementing external STG and enabling ourselves to start compilation from there is definitely great since:
- A lot of RTS knowledge wired into Cmm is no longer there (e.g. closure representation), so we can experiment with Wasm's GC proposal
- Link-time optimization at the STG level likely yields smaller code than Cmm.
- This can benefit all GHC backends targetting managed runtimes, and can be upstreamed once proven to work.
Describe the solution you'd like There are some possible ways to implement this:
- Photocopy the whole
StgSyndatatypes definitions and serialize those. At compile-time, marshal the upstream version to the custom version; at link-time, reconstruct the upstream version on the fly. This is the approach inghc-grin. - Seek to implement
Binaryinstances for theStgSyndatatypes. Some of those contain fields with types likeId,Type, etc which are hard to serialize since they depend on certain GHC context. - Don't do external STG; do external Core instead, and reuse the
Ifacelogic to serialize allCoreunfoldings.
For now, 3 looks the most promising. clash-compiler uses a similar approach iirc; clash-prelude is compiled with -fexpose-all-unfoldings, and later clash-ghc reconstructs CoreSyn from the ModIface.
After some trial and error, the most likely roadmap to an MVP of this seems to be:
- Use the
BinIfacemechanism to serialize a part ofCgGuts. We need to serialize[TyCon]andCoreProgramhere. Other fields are trivial to deal with. - It should be trivial to convert stuff to
IfaceTyCon,IfaceExpr, and serialize them. - We can use
initIfaceLoad/initIfaceCheckto set up the type checker session required to reconstruct Core AST from Iface AST. (clash-ghcseems to useinitIfaceCheck)
The logic can be implemented as an experimental branch of asterius. We'll set-up a "roundtrip" compilation pipeline that doesn't alter codegen/linker logic:
ghc-toolkitwill first obtain theCgGutsin the regular pipeline, and perform a round of serialization/deserialization. For each compilation unit, a new dummyHscEnvwill be set up and the original one in the pipeline will be discarded.- The deserialized
CgGutswill then go through the rest of STG/Cmm/Wasm pipeline. - Turn on linting for all ASTs for some extra safety.
This will be a good way to validate that the serialization logic works, without being too intrusive in the current asterius codebase. Once we're sure it works, the next step would be:
- Use the "external Core" stuff as the object format
- Perform gc-sections at the Core level