riscv-perf-model
riscv-perf-model copied to clipboard
Implement micro op fusion in decode stage.
In the decode stage, we might find several pairs of uops that can be merged into one instruction to increase performance. Since this optimization is common in modern high-performance CPUs, we can add this feature for users to model the performance gain.
Oh, absolutely!
The challenge here is -- can you build a small fusion framework in Olympia that allows a user of the model to experiment with configurable combinations? In other words -- do not hard-code the pairings in the simulator, set up a framework that is runtime programmable via YAML or JSON to identify pairings. That'd be really cool and very powerful.
@klingaard Is there any support for this in mavis? I saw a morph instruction function.
@klingaard maybe we can add those configure into small_core.yaml ?
Is there any support for this in mavis? I saw a morph instruction function.
Yes, and you're correct, it's related to the morph
function call. I'm not a Mavis expert (@dbmurrell is the original author), but if you look at https://github.com/sparcians/mavis/blob/4f3fef891f9ddc5c371c27500d02596f21ea6fc8/test/main.cpp#L446 you can see an example of how you can morph an existing instruction into a fused one. I think the process is:
- Identify a pairing (within a decode group or across [that's tricky])
- Morph the first instruction into the fused "new" operation
- No-op the second (force it to go directly to the ROB)
maybe we can add those configure into small_core.yaml
I think that's reasonable, but you might run into limitations with YAML to properly identify pairings. Dunno until there's a design in place for how you want to do it. Suggestion: Might want to specify a different language (an XML derivative with a DOM) and reference that:
top.cpu.core0.extension.core_extensions:
decode_fusions: "fusion_pairs.xml"
My suggestion for this entire effort: move this to a discussion and create a design document. Start with a use case, specifically, which pairs will you initially be fusing? For those pairs, what are the constraints?
For example, the first instruction must be an add
followed by a branch
AND the add
's RD
field must be the same as the branch
's RS2
field... etc.
From there, you can help you determine the "language" you want to build to specify the pairings -- and how a generic fuser will convert that into runtime code...
So @jeffnye-gh has been looking at this. Discussion: https://github.com/riscv-software-src/riscv-perf-model/discussions/121 as well as first PR: #135
I believe this can be closed now. Support for fusion is available through the FSL API and FusionDecoder.cpp