rocket-chip icon indicating copy to clipboard operation
rocket-chip copied to clipboard

Is it possible to return RoCC result in one clock cycle?

Open yjwen opened this issue 6 years ago • 4 comments

Hi,

First of all, thanks for the great project that enabled fast evaluations about various RISC-V based SoC architectures.

I have a question regarding RoCC. I will appreciate much for any comments.

I was trying to implement an RoCC module for pure combinational logics. For example, I'd like to have the RoCC module to take 64bit operands as pairs of 32bit operands and return the pair of multiplication of those operands, so that when rs1= {32'd1, 32'd5}, rs2 = {32'd3, 32'd7}, it is expected rd = {32'd3, 32'd35}.

I have implemented that RoCC module, whose ports were defined as below:

  io.cmd.ready := Bool(true) // Cmd is always resolved immediately

  io.resp.bits.rd := io.cmd.bits.inst.rd
  io.resp.bits.data := blabla // Do 32-bit multiplcation on operands


  io.resp.valid := io.cmd.fire() // Resp is always ready
  io.busy := Bool(false)
  io.interrupt := Bool(false)

  // No memory request.
  io.mem.req.valid := Bool(false)

The RoCC module was appended to a Rocket core of DefaultConfig.

Since it is pure combinational logics, I blindly expected the RoCC module to return the result within one clock cycle. However, Verilator showed me that instruction took 5 cycles to finish. The log was as below.

C0: 70490 [1] pc=[00800017be] W[r 0=0000000100000007][1] R[r20=0000000100000007] R[r21=0000000300000005] inst=[055a7b0b] custom0.rd.rs1.rs2 (args unknown) C0: 70491 [0] pc=[00800017be] W[r 0=0000000100000007][0] R[r20=0000000100000007] R[r21=0000000300000005] inst=[055a7b0b] custom0.rd.rs1.rs2 (args unknown) C0: 70492 [0] pc=[00800017be] W[r22=0000000300000023][1] R[r20=0000000100000007] R[r21=0000000300000005] inst=[055a7b0b] custom0.rd.rs1.rs2 (args unknown) C0: 70493 [0] pc=[00800017be] W[r 0=0000000100000007][0] R[r20=0000000100000007] R[r21=0000000300000005] inst=[055a7b0b] custom0.rd.rs1.rs2 (args unknown) C0: 70494 [0] pc=[00800017be] W[r 0=0000000100000007][0] R[r20=0000000100000007] R[r21=0000000300000005] inst=[055a7b0b] custom0.rd.rs1.rs2 (args unknown)

The instruction took r20 and r21 as operands and wrote the result to r22. The result was correct, but was available only at the 3rd clock, as showed in bold in the above log. Even after r22 had had the result, there were 2 extra clocks occupied by that instruction.

So I am wondering whether it is possible to create an RoCC that can return result within one clock cycle, like a normal instruction? And if so, how?

Regards, Yujie

yjwen avatar May 02 '18 11:05 yjwen

OK. I found actually custom instructions can finish within one cycle, as far as its result is not immediately used by following instructions. I guest the result of RoCC is accepted within one cycle. However, the routing from RoCC's return value to the desired register (rd) is a process spanning multiple cycles.

yjwen avatar May 08 '18 01:05 yjwen

@yjwen Thanks for your post. I'm new in RoCC. I'm looking for an accelerator that you have already done. First I've tried with this repo and successfully execute all the tests. That repo is created considering riscv-tools repo as the norm. Now I want to use Project-templete . Could you pleae let me know the steps you folw to create accelerator and how you test it. I appriciate your support. I stack here long time. Thank you Regares Riaz

riazcseiu avatar Jan 15 '19 02:01 riazcseiu

Hi @yjwen, I am wondering have you found the way to get ROCC to be replied in one clock cycle?

zhejianguk avatar Apr 13 '22 14:04 zhejianguk

@zhejianguk Haven't touched rocket-chip for a long time. By my memory, the rocket-chip CPU is pipelined, you can have RoCC module to return value within one cycle, but it costs multiple cycles to route RoCC module's results to expected register. In case the result is used by the immediate next instruction, that instruction will be blocked for multiple cycle.

yjwen avatar Apr 16 '22 06:04 yjwen