riscv-perf-model icon indicating copy to clipboard operation
riscv-perf-model copied to clipboard

Design document of the Branch Prediction Unit

Open dragon540 opened this issue 1 year ago • 4 comments

This is the design document for the branch prediction unit to be added to Olympia simulator. This document aims to give an overview of the micro-architectural and implementation detail of the BPU.

Document will be further updated as development progresses. Any suggestions are appreciated.

dragon540 avatar Dec 02 '24 16:12 dragon540

This is similar to Shobhit's other PR. Not expected to compile.

We asked that interns post their code in what ever state as draft PRs at the end of the internship so as not to lose any progress.

jeffnye-gh avatar Jan 14 '25 03:01 jeffnye-gh

Shobit, can you pull the latest master in your branches? CI should pass now

knute-mips avatar Mar 10 '25 16:03 knute-mips

@klingaard @arupc this document is still a work in progress however some details are much more clearer than before. Can you take a look at this, and check if the high level design of the unit (interaction of BPU with Fetch and FTQ, organization of its constituent predictors, etc) looks okay?

dragon540 avatar Mar 16 '25 22:03 dragon540

@klingaard @arupc this document is still a work in progress however some details are much more clearer than before. Can you take a look at this, and check if the high level design of the unit (interaction of BPU with Fetch and FTQ, organization of its constituent predictors, etc) looks okay?

Can do!

klingaard avatar Mar 16 '25 23:03 klingaard

@dragon540 I know some of the code development is already being done in #243 but providing some feedback specific to the design document.

Overall I think the document is well done and covers a lot of the components one would like to see in a more high performance BPU & Front-end.

At a high level, my understanding from the design doc is the BranchPredIF.hpp interface is intended to be used for the BPU unit. PredictionRequest, PredictionOutput, and UpdateInput correspond to the interface "prediction input", "prediction output" and "update input" types from the interface.

I don't believe the BranchPredIF.hpp interface is currently being used by the individual branch predictors themselves though (e.g. PHT, BTB, TAGE, etc). I believe the intent behind the interface is that individual predictors can and should use the full interface as well. Specifically, inputs and outputs to the individual predictors should be templatized such that they can be redefined as needed. The existing predictors demonstrate where this would be useful as the PHT takes a hash of PC and global history as input vs PC being used for the other predictors. With using predictor-specific PredictionRequest, PredictionOutput, and UpdateInput types, a generic interface can be used with any individual predictor and defined to suit that predictor (e.g. PHT -> hash of PC & global hist, BTB -> PC). They can then be defaulted as is currently defined by the design doc. This would make for a less "hardened" approach to interfacing with the individual predictors along with the interfaces to the overall BPU unit.

In a similar thought, if there is a way to enable or disable individual predictors via command-line or configuration, I think that would be helpful. Meaning, an ability to define the collection of actual predictors implemented in the BPU would be useful along with a modular branch-predictor approach so then using a configuration file one can specify the collection of predictors implemented in the BPU (e.g. a BOOM uarch configuration, etc). The thought being that one can then enable, disable, or even add additional modular predictors that can be selected at via configuration. A simple example would be with the existing design, if one wanted to evaluate TAGE with and without the Statistical Corrector predictor. This would likely mean having classes for the individual predictors and then parent classes coupling them into "BasePredictor" and "TAGE_SC_L".

  • I realize that brings up a question of how does the design function/what assumptions can be made if someone were to disable all individual predictors, but I think we can prevent that and assume there will be one "base" (simple) and possibly one "TAGE_SC_L" (complex) predictor.

oms-vmicro avatar May 14 '25 21:05 oms-vmicro

From todays call 2025.06.19, regarding changes to the api to support staged predictions.

Short of it is: I think we want to expose signals in the API to allow override of previous predictions. This has some implications to what is behind the BP-API.

In recent open source designs there is the concept of staged prediction, the uBTB/LP/TAGE/SC/ITTAGE all deliver predictions, some with longer latency. Not all of these will exist, not all latency values will be different.

In terms of the visible methods in the BP-API I think you want to add a method to signal an override to the front end logic and pass a struct with override info, BP request queue idx, basic block address, new prediction, maybe an extension for debug/metadata info.

I am making an assumption that the decision to signal an override has been made behind the BP-API, a topic for discussion, but seems like a way to allow greatest freedom/generality to developers to explore choices and keep the API smaller.

You could imagine I would test the effectiveness of a tournament selector vs simply always choosing the latest prediction that is different than an earlier one, and these differences would not need to be exposed through the api.

One other thing I did not mention in the call, I believe the BP-API will want to support multiple prediction requests, updates, and returns. If you read the trade press 2 or 4 predictions at a time will become the bar. I believe this just means that request/result data is grouped. Not a significant change on the surface, but occurred to me while writing.

jeffnye-gh avatar May 19 '25 18:05 jeffnye-gh

~I can put this in a PR but it's one file~,

I added to Arup's BP interface class, support for 2T/nT(multiple predictions) and stages prediction results. This is for discussion.

I put this in a draft pr, seemed easier. https://github.com/riscv-software-src/riscv-perf-model/pull/259

jeffnye-gh avatar May 26 '25 21:05 jeffnye-gh

@dragon540 there is a lot of great feedback from @jeffnye-gh and @oms-vmicro in this PR. Can you resolve the conversations that you have addressed and comment/fix the remaining issues? When you're ready, please convert this draft PR and we can get it merged in. Thanks

klingaard avatar Jul 13 '25 15:07 klingaard