sunflower-embedded-system-emulator icon indicating copy to clipboard operation
sunflower-embedded-system-emulator copied to clipboard

Decided against using a hash function/table for a decode cache in the riscv pipeline.

Open SamuelmsWong-zz opened this issue 6 years ago • 0 comments

Context: The Hitachi SuperH implementation in sunflower has a 'decode cache' that decodes all possible instructions and stores them in an array at the start of node creation. These are then referenced during pipeline simulation. This speeds up the simulator because it does not need to go through the switch-case checks in decode-riscv.c, but instead merely references an array. This decode cache array literally uses the instruction code as an index, which means the decode cache is an array of size 65536 (for 16-bit superH instructions).

Problem: The same is likely unreasonable to do for the riscv's 32-bit instruction set - a decode cache would have to be a 4294967296 long array.

Options:

  • Sunflower's simulation speed is less important now than is was when it was first built, so one "simple" option could be to not use a decode cache at all and decode each instruction as they come.
  • The option of using a hash function is also possible. There are 99 unique instructions in the standard rv32I set and FD extensions. There should exist a mapping from a 32-bit string to a 7-bit string (up to 128 instrs) which could then be used to index into a decode cache. The cost of this hash function should be an acceptable improvement upon the switch-case statements of the "simple" method.
  • Other solutions have not been thought of.

Discussion: The simulator goes through roughly 10-20 switch-case evaluations per decode, so the hash function should do better than this in terms of if statements and bit operations. This is quite difficult given the different formats of riscv instructions and the possibility of collisions.

One should note two things: 1) the superHdecode cache is 16 bits large, so the riscv decode cache does not have to be limited to 7-bits, and 2) not all bits in a riscv instruction contribute to identifying the instruction. In fact it happens that there are only 16 bits contributing to the uniqueness of a RV32IFD instruction (mini-opcode [bits 2-6; bits 0-1 are always 11], funct3 [bits12-14], bit 20, and funct7 [bits 25-31]). These could theoretically be squashed into a single 16-bit number for the riscv decode cache, but the amount of bit masking, shifting and 'or'ing required would hardly be an improvement (4 bit-groups * 3 operations each = 12 bit operations).

Therefore I have decided not to use any hash tables and to simply decode each instruction as they come.

SamuelmsWong-zz avatar Aug 01 '19 17:08 SamuelmsWong-zz