CFU-Playground
CFU-Playground copied to clipboard
Instruction after CFU instruction is getting wrong result
Create proj/hps_accel/src/main.c
with the following contents:
#include <stdio.h>
#include "base.h"
#include "riscv.h"
__attribute__((optimize("align-functions=32"))) int check_post_cfu_instruction(
int input) {
register int output;
int tmp;
asm volatile(
// Some padding. The alignment of the CFU instruction seems to matter.
"nop \n\t"
"nop \n\t"
"nop \n\t"
"nop \n\t"
"nop \n\t"
"nop \n\t"
// Set up an input to our CFU instruction.
"li a6, 0; \n\t"
// These four instructions seem to need to be stores.
"sw %[input],%[tmp] \n\t"
"sw %[input],%[tmp] \n\t"
"sw %[input],%[tmp] \n\t"
"sw %[input],%[tmp] \n\t"
// This can be any instruction.
"nop \n\t"
// Our CFU instruction.
".word ((CUSTOM0) | (regnum_a7 << 7) | (regnum_a6 << 15) | (regnum_a6 << "
"20) | ((0) << 12) | ((5) << 25)) \n\t"
// The value stored into %[output] is wrong.
"mv %[output],%[input] \n\t"
// Changing the following nop to repeat the line above, or swapping it
// with the line above "fixes" the bug.
"nop \n\t"
"nop \n\t"
"nop \n\t"
: [output] "=r"(output), [tmp] "=m"(tmp)
: [input] "r"(input)
: "a6", "a7"
);
return output;
}
int main(void) {
init_runtime();
int v = check_post_cfu_instruction(42);
// This should print the value 42, but when the bug is occurring, it prints 0.
printf("Value: %d\n", v);
return (0);
}
Then run:
(source environment; cd proj/hps_accel/; make PLATFORM=sim load)
Expected output:
....
Value: 42
Actual output:
....
Value: 0
Swapping the mv
instruction with the nop
that follows it, adjusting the alignment +/- 8 or changing some of the store instructions to nop
s makes the problem go away.
I'm guessing this is probably a bug in VexRiscv's CFU interface - possibly some bad interaction with instruction pipelining.
Thanks @davidlattimore , I will try to reproduce it, and then try it with the latest VexRiscv in case it's a bug that's been fixed recently.
Using the above code with proj/example_cfu
, the problem doesn't reproduce. However it's possible to change example_cfu to get the problem to reproduce.
It seems that it depends on the behavior of the done
signal from both the CFU instruction being called and instruction 0. Applying the following patch makes the problem reproduce with example_cfu:
diff --git a/proj/example_cfu/cfu.py b/proj/example_cfu/cfu.py
index b0a5209..7741f13 100755
--- a/proj/example_cfu/cfu.py
+++ b/proj/example_cfu/cfu.py
@@ -98,7 +98,7 @@ class SumBytesInstruction(InstructionBase):
self.in1.word_select(2, 8) +
self.in1.word_select(3, 8)
)
- m.d.comb += self.done.eq(1)
+ m.d.sync += self.done.eq(self.start)
class SumBytesInstructionTest(InstructionTestBase):
I've also determined that the value that gets put into the register by the post-CFU instruction is the value returned by CFU instruction 0.
So if I do:
call CFU instruction 1
mv a6, a2
... then the value stored into a6, should be the value from a2, but instead is the result of CFU instruction 0, even though we ran CFU instruction 1.
@davidlattimore -- CFU instructions 0 and 1 meaning the first and second executed, or with functions 0x0 and 0x1?
I guess we're missing the response timing from the CFU side -- do you know what it is for the original case? Is rsp_valid
asserted immediately when cmd_valid
is asserted, or with one cycle latency, or with multiple cycle latency?
Yep, I mean functions 0x0 and 0x1. After a little more experimentation it seems as though the instruction invoked needs to be a multiple cycle instruction in order for the bug to manifest. I need to do more experimentation tomorrow, but it's starting to feel like the CPU is invoking two custom functions - the one that was requested and then function 0. I still need to check if the result of the original instruction is going into the output register.
I captured a trace and confirmed that the CPU is indeed calling the requested CFU function, then when the function returns, it's immediately calling CFU function 0. In the following screenshot cmd_valid
goes high for function 1, then we respond (rsp_valid
goes high). In the next cycle after we respond, cmd_valid
goes high again, but at that point current_function_id
is 0.
The code running on the CPU only requested CFU function 1.
Trace files: sim.fst.gz sim.fst.hier.gz
I uploaded the code to produce the trace here.
The command to produce the trace was:
(source environment; cd proj/example_cfu; make PLATFORM=sim ENABLE_TRACE_ARG=--trace-fst load)
@dolu1990, would you have time to take a look? ^^^^^ It looks like a spurious cmd sent to the CFU.
Hooo right, nice catch ^^
I would say, changing :
https://github.com/SpinalHDL/VexRiscv/blob/4fff62d3feeb05edcea46e66b25de0ecf3f274fa/src/main/scala/vexriscv/plugin/CfuPlugin.scala#L196 into val fired = RegInit(False) setWhen(bus.cmd.fire) clearWhen(!arbitration.isStuck)
would fix it. @davidlattimore Do you have the flow in place to test it out ? I guess you can directly change it into the generated netlist to try it easily, modifying the always block driving execute_CfuPlugin_fired
(i realy have nothing in place to test things) :
/media/data/open/tmp/CFU-Playground/scripts/pyrun /media/data/open/tmp/CFU-Playground/proj/example_cfu/cfu_gen.py
Traceback (most recent call last):
File "/media/data/open/tmp/CFU-Playground/proj/example_cfu/cfu_gen.py", line 16, in <module>
from amaranth import *
ModuleNotFoundError: No module named 'amaranth'
../proj.mk:241: recipe for target 'generate_cfu' failed
make: *** [generate_cfu] Error 1
@tcal-x I forgot that change : https://github.com/SpinalHDL/VexRiscv/commit/c242744d02b052a7883aa618fe6ce9dca5f7fa68
So with it, the CfuPlugin can be safely used with statefull CFU, in what ever config VexRiscv is.
ahhh got amaranth up, but got some issue with meson package, my python setup is completly broken ^^
Thanks @Dolu1990 ! I can help @davidlattimore test it at our end, or perhaps I can reproduce it myself.
Thanks @Dolu1990! I'll test the change. I'll probably need @tcal-x 's help to build the right version of VexRiscv
I still need to do more testing, but the commit "CfuPlugin now only fork when the rest of the pipeline is hazard free" does appear to fix the problem. I've so far only tested in Verilator.
I pushed the VexRiscv fix in both dev and master branches. and also in the litex python data repo.
So, if you update litex, all should be good. let's me know if not ^^
Thanks @Dolu1990! I won't be able to generate new VexRiscv verilogs for @davidlattimore to test until later this week.
ahhh other verilog than the one of hte python data repo ?
Yes, we generate VexRiscv Verilog in our own repo (but then check in the verilog) since we keep tuning the configuration.
Part of a summer project will be to dynamically generate the VexRiscv Verilog each build.
@Dolu1990 --
@tcal-x I forgot that change : https://github.com/SpinalHDL/VexRiscv/commit/c242744d02b052a7883aa618fe6ce9dca5f7fa68
So with it, the CfuPlugin can be safely used with statefull CFU, in what ever config VexRiscv is.
I'm confused -- will this commit fix by itself the issue that we are seeing? or do we also need the change to line 196? Thanks, Tim
As far as I can tell, that commit by itself solves the issue. At least my repro code no longer reproduces the problem with just that commit added.
Hooo good ^^
So the last patch i did is more a sanity one i guess XD