riscv-perf-model icon indicating copy to clipboard operation
riscv-perf-model copied to clipboard

Implement micro op fusion in decode stage.

Open zxc12523 opened this issue 1 year ago • 5 comments

In the decode stage, we might find several pairs of uops that can be merged into one instruction to increase performance. Since this optimization is common in modern high-performance CPUs, we can add this feature for users to model the performance gain.

zxc12523 avatar Nov 07 '23 18:11 zxc12523

Oh, absolutely!

The challenge here is -- can you build a small fusion framework in Olympia that allows a user of the model to experiment with configurable combinations? In other words -- do not hard-code the pairings in the simulator, set up a framework that is runtime programmable via YAML or JSON to identify pairings. That'd be really cool and very powerful.

klingaard avatar Nov 07 '23 19:11 klingaard

@klingaard Is there any support for this in mavis? I saw a morph instruction function.

danbone avatar Nov 08 '23 12:11 danbone

@klingaard maybe we can add those configure into small_core.yaml ?

zxc12523 avatar Nov 09 '23 06:11 zxc12523

Is there any support for this in mavis? I saw a morph instruction function.

Yes, and you're correct, it's related to the morph function call. I'm not a Mavis expert (@dbmurrell is the original author), but if you look at https://github.com/sparcians/mavis/blob/4f3fef891f9ddc5c371c27500d02596f21ea6fc8/test/main.cpp#L446 you can see an example of how you can morph an existing instruction into a fused one. I think the process is:

  1. Identify a pairing (within a decode group or across [that's tricky])
  2. Morph the first instruction into the fused "new" operation
  3. No-op the second (force it to go directly to the ROB)

maybe we can add those configure into small_core.yaml

I think that's reasonable, but you might run into limitations with YAML to properly identify pairings. Dunno until there's a design in place for how you want to do it. Suggestion: Might want to specify a different language (an XML derivative with a DOM) and reference that:

top.cpu.core0.extension.core_extensions:
    decode_fusions: "fusion_pairs.xml"

My suggestion for this entire effort: move this to a discussion and create a design document. Start with a use case, specifically, which pairs will you initially be fusing? For those pairs, what are the constraints?

For example, the first instruction must be an add followed by a branch AND the add's RD field must be the same as the branch's RS2 field... etc.

From there, you can help you determine the "language" you want to build to specify the pairings -- and how a generic fuser will convert that into runtime code...

klingaard avatar Nov 09 '23 15:11 klingaard

So @jeffnye-gh has been looking at this. Discussion: https://github.com/riscv-software-src/riscv-perf-model/discussions/121 as well as first PR: #135

klingaard avatar Feb 08 '24 20:02 klingaard

I believe this can be closed now. Support for fusion is available through the FSL API and FusionDecoder.cpp

jeffnye-gh avatar Oct 21 '24 06:10 jeffnye-gh