asterius icon indicating copy to clipboard operation
asterius copied to clipboard

Migrate build system to Bazel

Open mboes opened this issue 4 years ago • 1 comments

Is your feature request related to a problem? Please describe.

There are multiple issues with building Asterius today:

  • It depends on custom infrastructure, like the presence of Docker images with particular names in particular registries. Hard to fork.
  • It depends on binary blobs uploaded to a particular place on GitHub, which contained binary distributions for a patched version of GHC. These blobs are hard to reproduce. Again hard to fork.
  • CI takes 1h of real-time and 2h of compute time, which is huge. Few opportunities for caching.
  • The build system is very hard to hack on. There are several packages with complicated custom Setup.hs build scripts, with hooks that call utilities to generate .cmm code, string concatenation to generate .hs modules at configure time, and an inscrutable boot process that happens in multiple stages (run a script, then invoke GHC via the API, then call another script). The process involves writing object files in immutable folders like the data directory of Cabal installation directories for one package, creating hidden directories in the data directory of another, etc. It's very difficult to have a global picture of what happens because the logic is split between Stack configuration files, Haskell scripts, Setup scripts and shell scripts, which themselves call other utilities implemented in Haskell.
  • There bad hygiene as to where generated files go. Sometimes installation directories are written to post installation. Sometimes source files are generated and become "data" of one package, consumed by another package.
  • Many utilities use generated Paths_* modules to locate other utilities or data files. This makes distributed caching nigh impossible because it hardcodes absolute paths in the binaries.
  • Likewise, the use of Cabal package installation directories breaks distributed caching, because the location of these directories is hardcoded.
  • Dependencies between stages of the build are not fully specified: there is no single command that I can call that will do the boot, build and tests and do so correctly even in the face of incremental rebuilds. The CI configuration defines many build steps.
  • Asterius has code in Haskell, Javascript, C, CMM and more. The build system is incorrect, in the sense that changes to some files might not trigger a rebuild, so that incremental builds might lead to undefined results.

Describe the solution you'd like

Rewrite the build system in a way where,

  • we don't have to shoehorn complicated build steps into Setup.hs build hooks,
  • incremental rebuilds are guaranteed to yield the same results as full rebuilds,
  • the code and the build system are engineered to avoid all absolute paths in binaries or data files, so as to make distributed caching viable, and thereby enable a strategy to reduce CI times significantly.

We should use Bazel for this. Bazel has good support for building Haskell code, has a better design for extensibility than Cabal does, allows the addition of arbitrary custom build rules using a simple Python-like language, forces a good separation between specification of the build targets and implementation of the build rules, provides strong guarantees for build system correctness because all build actions are sandboxed, supports distributed caching, and has good integration with Nixpkgs.

Describe alternatives you've considered

We could continue to have a Stack-based build, and solve some of the points by leveraging the integration with Nix. For example, using Nix to build the custom GHC toolchain, and then telling Stack about that would obviate the need for hard-to-reproduce binary blobs distributed via GitHub. But this wouldn't resolve any of the other points.

We could use Nix only and not Bazel. But Nix still relies on .cabal files and Cabal to build Haskell code. We would still have to cram a lot of logic into brittle Setup.hs scripts. And there would be no hope to make Asterius buildable on Windows.

mboes avatar Apr 27 '20 06:04 mboes

I agree that a bazel-based build system could address some of the points listed above.

I'm conservative about whether it'll help the CI story though. It may cache the build results before booting, but booting is a CI bottleneck that needs to be performed for every single build. I'm not sure if bazel is smart enough (or can be configured) to recognize what changes should trigger a reboot.

TerrorJack avatar Apr 27 '20 07:04 TerrorJack