coreblocks icon indicating copy to clipboard operation
coreblocks copied to clipboard

Order of announcing executed instructions

Open xThaid opened this issue 1 month ago • 2 comments

While working on #699 I found that increasing the size of the instruction buffer causes performance loss on a benchmark.

Here's what happened. Increased size of the instruction buffer caused higher utilization of backend (in particular ROB). Thus, it is more likely that two instructions will be ready to announce at the same time and marked as done in ROB. However, currently we collect finished instructions from FUs with no specific order. And this is the reason why we get the slowdown in crc32.

Specifically, at some point two instructions are ready to be collected and the newer (with higher ROB id) is selected and marked as done in ROB. Therefore in the next cycle we cannot retire the older instruction yet (because it is not marked as done yet) and we lose one cycle. This happens every program loop iteration and in total results in many wasted cycles. With lower backend utilization, the instruction causing the problem wouldn't be ready to execute yet and we would retire the oldest instruction as soon as possible.

Choosing the order of the instructions we want to announce doesn't seem to be a trivial task -- sounds like a scheduling problem and we don't have full information about future instructions. I think we should simply choose the oldest instruction. By doing that we are releasing the resources as soon as possible and making space for further instructions.

The only small issue is that currently we cannot reliably tell which ROB id is older - ROB id is a circular pointer. Either we add one bit to every ROB id or we do some heuristics like a < b iff (b - a) % rob_size < rob_size / 2 (which will work with the assumption that instructions with ROB ids different more than rob_size / 2 will never be ready to announce at the same time).

What do you think about this problem?

xThaid avatar May 10 '24 16:05 xThaid

You can reliably tell which rob id is older if you know the start and end indices of its circular buffer. A nonexclusive method to get them, get_indices, already exists in the rob.

Kristopher38 avatar May 11 '24 09:05 Kristopher38

Two thoughts:

  • This might be partially related to the RS select order - a bad order of selecting instructions from RS can impact performance.
  • Announcement needs to be reworked later for superscalarity support. Superscalarity might fix this performance issue in some cases.

All in all, I think this shouldn't be our priority now.

tilk avatar May 13 '24 09:05 tilk