CFU-Playground Instruction after CFU instruction is getting wrong result

Create proj/hps_accel/src/main.c with the following contents:

#include <stdio.h>

#include "base.h"
#include "riscv.h"

__attribute__((optimize("align-functions=32"))) int check_post_cfu_instruction(
    int input) {
  register int output;
  int tmp;
  asm volatile(
      // Some padding. The alignment of the CFU instruction seems to matter.
      "nop \n\t"
      "nop \n\t"
      "nop \n\t"
      "nop \n\t"
      "nop \n\t"
      "nop \n\t"

      // Set up an input to our CFU instruction.
      "li a6, 0; \n\t"

      // These four instructions seem to need to be stores.
      "sw	%[input],%[tmp] \n\t"
      "sw	%[input],%[tmp] \n\t"
      "sw	%[input],%[tmp] \n\t"
      "sw	%[input],%[tmp] \n\t"

      // This can be any instruction.
      "nop \n\t"

      // Our CFU instruction.
      ".word ((CUSTOM0) | (regnum_a7 << 7) | (regnum_a6 << 15) | (regnum_a6 << "
      "20) | ((0) << 12) | ((5) << 25)) \n\t"

      // The value stored into %[output] is wrong.
      "mv	%[output],%[input] \n\t"

      // Changing the following nop to repeat the line above, or swapping it
      // with the line above "fixes" the bug.
      "nop \n\t"

      "nop \n\t"
      "nop \n\t"
      : [output] "=r"(output), [tmp] "=m"(tmp)
      : [input] "r"(input)
      : "a6", "a7"

  );
  return output;
}

int main(void) {
  init_runtime();

  int v = check_post_cfu_instruction(42);

  // This should print the value 42, but when the bug is occurring, it prints 0.
  printf("Value: %d\n", v);

  return (0);
}

Then run:

(source environment; cd proj/hps_accel/; make PLATFORM=sim load)

Expected output:

....
Value: 42

Actual output:

....
Value: 0

Swapping the mv instruction with the nop that follows it, adjusting the alignment +/- 8 or changing some of the store instructions to nops makes the problem go away.

I'm guessing this is probably a bug in VexRiscv's CFU interface - possibly some bad interaction with instruction pipelining.

May 12 '22 00:05 davidlattimore

Thanks @davidlattimore , I will try to reproduce it, and then try it with the latest VexRiscv in case it's a bug that's been fixed recently.

May 12 '22 04:05 tcal-x

Using the above code with proj/example_cfu, the problem doesn't reproduce. However it's possible to change example_cfu to get the problem to reproduce.

It seems that it depends on the behavior of the done signal from both the CFU instruction being called and instruction 0. Applying the following patch makes the problem reproduce with example_cfu:

diff --git a/proj/example_cfu/cfu.py b/proj/example_cfu/cfu.py
index b0a5209..7741f13 100755
--- a/proj/example_cfu/cfu.py
+++ b/proj/example_cfu/cfu.py
@@ -98,7 +98,7 @@ class SumBytesInstruction(InstructionBase):
             self.in1.word_select(2, 8) +
             self.in1.word_select(3, 8)
         )
-        m.d.comb += self.done.eq(1)
+        m.d.sync += self.done.eq(self.start)
 
 
 class SumBytesInstructionTest(InstructionTestBase):

I've also determined that the value that gets put into the register by the post-CFU instruction is the value returned by CFU instruction 0.

So if I do:

call CFU instruction 1
mv a6, a2

... then the value stored into a6, should be the value from a2, but instead is the result of CFU instruction 0, even though we ran CFU instruction 1.

May 12 '22 04:05 davidlattimore

@davidlattimore -- CFU instructions 0 and 1 meaning the first and second executed, or with functions 0x0 and 0x1?

I guess we're missing the response timing from the CFU side -- do you know what it is for the original case? Is rsp_valid asserted immediately when cmd_valid is asserted, or with one cycle latency, or with multiple cycle latency?

May 12 '22 04:05 tcal-x

Yep, I mean functions 0x0 and 0x1. After a little more experimentation it seems as though the instruction invoked needs to be a multiple cycle instruction in order for the bug to manifest. I need to do more experimentation tomorrow, but it's starting to feel like the CPU is invoking two custom functions - the one that was requested and then function 0. I still need to check if the result of the original instruction is going into the output register.

May 12 '22 10:05 davidlattimore

I captured a trace and confirmed that the CPU is indeed calling the requested CFU function, then when the function returns, it's immediately calling CFU function 0. In the following screenshot cmd_valid goes high for function 1, then we respond (rsp_valid goes high). In the next cycle after we respond, cmd_valid goes high again, but at that point current_function_id is 0.

cfu_glitch

The code running on the CPU only requested CFU function 1.

May 12 '22 23:05 davidlattimore

Trace files: sim.fst.gz sim.fst.hier.gz

I uploaded the code to produce the trace here.

The command to produce the trace was:

(source environment; cd proj/example_cfu; make PLATFORM=sim ENABLE_TRACE_ARG=--trace-fst load)

May 13 '22 03:05 davidlattimore

@dolu1990, would you have time to take a look? ^^^^^ It looks like a spurious cmd sent to the CFU.

May 13 '22 04:05 tcal-x

Hooo right, nice catch ^^

I would say, changing :

https://github.com/SpinalHDL/VexRiscv/blob/4fff62d3feeb05edcea46e66b25de0ecf3f274fa/src/main/scala/vexriscv/plugin/CfuPlugin.scala#L196 into val fired = RegInit(False) setWhen(bus.cmd.fire) clearWhen(!arbitration.isStuck)

would fix it. @davidlattimore Do you have the flow in place to test it out ? I guess you can directly change it into the generated netlist to try it easily, modifying the always block driving execute_CfuPlugin_fired

(i realy have nothing in place to test things) :

/media/data/open/tmp/CFU-Playground/scripts/pyrun /media/data/open/tmp/CFU-Playground/proj/example_cfu/cfu_gen.py 
Traceback (most recent call last):
  File "/media/data/open/tmp/CFU-Playground/proj/example_cfu/cfu_gen.py", line 16, in <module>
    from amaranth import *
ModuleNotFoundError: No module named 'amaranth'
../proj.mk:241: recipe for target 'generate_cfu' failed
make: *** [generate_cfu] Error 1

@tcal-x I forgot that change : https://github.com/SpinalHDL/VexRiscv/commit/c242744d02b052a7883aa618fe6ce9dca5f7fa68

So with it, the CfuPlugin can be safely used with statefull CFU, in what ever config VexRiscv is.

May 13 '22 15:05 Dolu1990

ahhh got amaranth up, but got some issue with meson package, my python setup is completly broken ^^

May 13 '22 15:05 Dolu1990

Thanks @Dolu1990 ! I can help @davidlattimore test it at our end, or perhaps I can reproduce it myself.

May 13 '22 16:05 tcal-x

Thanks @Dolu1990! I'll test the change. I'll probably need @tcal-x 's help to build the right version of VexRiscv

May 15 '22 00:05 davidlattimore

I still need to do more testing, but the commit "CfuPlugin now only fork when the rest of the pipeline is hazard free" does appear to fix the problem. I've so far only tested in Verilator.

May 16 '22 05:05 davidlattimore

I pushed the VexRiscv fix in both dev and master branches. and also in the litex python data repo.

So, if you update litex, all should be good. let's me know if not ^^

May 16 '22 08:05 Dolu1990

Thanks @Dolu1990! I won't be able to generate new VexRiscv verilogs for @davidlattimore to test until later this week.

May 16 '22 11:05 tcal-x

ahhh other verilog than the one of hte python data repo ?

May 16 '22 12:05 Dolu1990

Yes, we generate VexRiscv Verilog in our own repo (but then check in the verilog) since we keep tuning the configuration.

Part of a summer project will be to dynamically generate the VexRiscv Verilog each build.

May 16 '22 12:05 tcal-x

@Dolu1990 --

@tcal-x I forgot that change : https://github.com/SpinalHDL/VexRiscv/commit/c242744d02b052a7883aa618fe6ce9dca5f7fa68

So with it, the CfuPlugin can be safely used with statefull CFU, in what ever config VexRiscv is.

I'm confused -- will this commit fix by itself the issue that we are seeing? or do we also need the change to line 196? Thanks, Tim

May 17 '22 10:05 tcal-x

As far as I can tell, that commit by itself solves the issue. At least my repro code no longer reproduces the problem with just that commit added.

May 17 '22 23:05 davidlattimore

Hooo good ^^

So the last patch i did is more a sanity one i guess XD

May 18 '22 11:05 Dolu1990

CFU-Playground CFU-Playground copied to clipboard

Instruction after CFU instruction is getting wrong result

CFU-Playground
CFU-Playground copied to clipboard