asterius icon indicating copy to clipboard operation
asterius copied to clipboard

Implement External STG

Open TerrorJack opened this issue 5 years ago • 1 comments

Is your feature request related to a problem? Please describe. This issue proposes the "external STG" feature. More specifically:

  • The object format should contain enough information to reconstruct the StgSyn AST at link-time.
  • The reconstructed StgSyn AST can be compiled to Cmm and then WebAssembly. The resulting linked program works with the current runtime.

As long as the current Cmm-based codegen/linker/runtime doesn't break, implementing external STG and enabling ourselves to start compilation from there is definitely great since:

  • A lot of RTS knowledge wired into Cmm is no longer there (e.g. closure representation), so we can experiment with Wasm's GC proposal
  • Link-time optimization at the STG level likely yields smaller code than Cmm.
  • This can benefit all GHC backends targetting managed runtimes, and can be upstreamed once proven to work.

Describe the solution you'd like There are some possible ways to implement this:

  1. Photocopy the whole StgSyn datatypes definitions and serialize those. At compile-time, marshal the upstream version to the custom version; at link-time, reconstruct the upstream version on the fly. This is the approach in ghc-grin.
  2. Seek to implement Binary instances for the StgSyn datatypes. Some of those contain fields with types like Id, Type, etc which are hard to serialize since they depend on certain GHC context.
  3. Don't do external STG; do external Core instead, and reuse the Iface logic to serialize all Core unfoldings.

For now, 3 looks the most promising. clash-compiler uses a similar approach iirc; clash-prelude is compiled with -fexpose-all-unfoldings, and later clash-ghc reconstructs CoreSyn from the ModIface.

TerrorJack avatar Apr 13 '20 07:04 TerrorJack

After some trial and error, the most likely roadmap to an MVP of this seems to be:

  • Use the BinIface mechanism to serialize a part of CgGuts. We need to serialize [TyCon] and CoreProgram here. Other fields are trivial to deal with.
  • It should be trivial to convert stuff to IfaceTyCon, IfaceExpr, and serialize them.
  • We can use initIfaceLoad/initIfaceCheck to set up the type checker session required to reconstruct Core AST from Iface AST. (clash-ghc seems to use initIfaceCheck)

The logic can be implemented as an experimental branch of asterius. We'll set-up a "roundtrip" compilation pipeline that doesn't alter codegen/linker logic:

  • ghc-toolkit will first obtain the CgGuts in the regular pipeline, and perform a round of serialization/deserialization. For each compilation unit, a new dummy HscEnv will be set up and the original one in the pipeline will be discarded.
  • The deserialized CgGuts will then go through the rest of STG/Cmm/Wasm pipeline.
  • Turn on linting for all ASTs for some extra safety.

This will be a good way to validate that the serialization logic works, without being too intrusive in the current asterius codebase. Once we're sure it works, the next step would be:

  • Use the "external Core" stuff as the object format
  • Perform gc-sections at the Core level

TerrorJack avatar May 17 '20 12:05 TerrorJack