regalloc2 icon indicating copy to clipboard operation
regalloc2 copied to clipboard

Support for cold blocks

Open Amanieu opened this issue 3 years ago • 2 comments

It would be useful to be able to mark some blocks as "cold" which means that they are rarely taken cold paths. The register allocator should prefer placing spills and moves in cold blocks if possible.

It turns out that very little needs to be done if we take advantage of the block ordering by requiring all cold blocks to be after normal blocks in terms of block index and instruction indices. This has the following consequences:

  • Due to the sorting of blockparams_out, bundle merging will attempt to merge all branch parameters coming from normal blocks before attempting to merge ones coming from cold blocks.
  • If a requirement conflict exists within a bundle, the bundle is split at the first conflict point. Since ranges are iterated in order, this will place the split in a cold block if there are no conflicts within normal blocks.
  • If register allocation requires splitting a bundle, the first conflict point is used as the split point. Same as above.

My only concern is that the block order will no longer be in RPO which is the ordering recommended by the documentation. While regalloc2 will still function properly, I am less sure of the impact it may have on the heuristics.

Note that there are no requirements related to the ordering of blocks, and there is no requirement that the control flow be reducible. Some heuristics used by the allocator will perform better if the code is reducible and ordered in reverse postorder (RPO), however: in particular, (1) this interacts better with the contiguous-range-of-instruction-indices live range representation that we use, and (2) the "approximate loop depth" metric will actually be exact if both these conditions are met.

Amanieu avatar Dec 26 '21 02:12 Amanieu

FYI there is already an issue on the wasmtime side: https://github.com/bytecodealliance/wasmtime/issues/2747

bjorn3 avatar Dec 26 '21 07:12 bjorn3

@Amanieu I think that placing cold blocks at the end of the function in the linear block order should basically just work, as you say.

The two issues in the design doc could potentially have an impact on compile time (first issue -- because we'll have longer, discontiguous live ranges) and code quality (second issue -- because an out-of-lined block from an inner loop, sunk to the end, could cause the approximate metric to treat the entire remainder of the function as a hot inner loop body).

For the second, I think a simple-enough answer is to stop the approximate-loop-depth scan before any code blocks (and treat them as zero depth). That would also have a side-effect of making the spill cost low in the cold paths, which is what we want.

So I can imagine adding a method to the Function trait something like fn first_cold_block(&self) -> Option<Block> and then use that to end this scan early, then that should be it. Does that seem reasonable to you?

cfallin avatar Jan 18 '22 19:01 cfallin