pspin
pspin copied to clipboard
How is core_region's clock controlled and gated?
I'm recently reading your paper "A RISC-V in-network accelerator for flexible high-performance low-power packet processing", along with the source code. And I find there're some mismatches between the paper and the source code, which are quite confusing for me.
I'm reading the source code on tag v0.6.1
, and I make no changes to the source files. There's no significant changes for hardware design in hw/
according to git diff
with branch master
, so I think it's okay to consider v0.6.1
as "update-to-date".
There are connections in hw/deps/pulp_cluster/rtl/pulp_cluster.sv
, that I believe play the role of clock-gating the core_region
.
// line 1031
cluster_peripherals #(
...
) cluster_peripherals_i (
...
.core_busy_i(core_busy),
.core_clk_en_o(clk_core_en),
...
);
// line 1155
core_region #(
...
) core_region_i (
...
.clock_en_i(clk_core_en[i]),
...
.core_busy_o(core_busy[i]),
...
);
Looks like that this cluster_peripherals_i
instance is controlling/clock-gating the RISC-V cores. However, the paper mentions that
If the HPU driver has no task/handler to execute, it stops the HPUs by clock-gating it.
But I didn't find any connection between HPU driver and cluster_peripherals_i
in the source code... Yet I don't find much description about this instance in the paper. So here are my questions:
- In current implemenation, by which module is core controlled/clock-gated, and what behavior is the module to control the core?
- What role is
cluster_peripherals_i
playing in the design? I noticed that it manages "events" from timer, DMA and etc., but how do these events and their sources work as a part of the design?
Hi!
In current implemenation, by which module is core controlled/clock-gated, and what behavior is the module to control the core?
You're right. That change didn't make it to the published code. To re-introduce it, it should be enough to drive the core clock from the HPU driver (https://github.com/spcl/pspin/blob/master/hw/src/pkt_scheduler/hpu_driver.sv): i.e., add a clk_o
to the HPU driver and use that as clk_i
of the core (https://github.com/spcl/pspin/blob/master/hw/deps/pulp_cluster/rtl/pulp_cluster.sv#L1167). The HPU driver can use its clk_i
and the condition state_q==Idle
(https://github.com/spcl/pspin/blob/master/hw/src/pkt_scheduler/hpu_driver.sv#L383) to gate the clock towards the core (i.e., the newly introduced clk_o
).
Alternatively, HPU driver could have a core_clock_en_o
signal that is combined with clk_i
by the core itself (maybe this is cleaner). Or just reuse clk_core_en
from cluster_peripherals
(the one you mentioned).
I'll be happy to review a PR in case you implement this!
What role is cluster_peripherals_i playing in the design? I noticed that it manages "events" from timer, DMA and etc., but how do these events and their sources work as a part of the design?
I'm pretty sure that this unit is not used in current design version. In the first iteration, the DMA engine was communicating with the cores via events (e.g., to signal DMA completion), thus via cluster_peripherals_i. Now the DMA engine provides a per-core interface, iirc. I think it could be safely removed but I'd need to double check.
Alternatively, HPU driver could just have a core_clock_en_o signal that is combined with clk_i by the core itself (maybe this is cleaner).
As the core_region
module already has a clock_en_i
input port, this solution may involve less changes to current design. I'll try to work it through.
I'm pretty sure that this unit is not used in current design version.
That's great. I'll try to remove it from the cluster to have a slimmer one. I understand that the DMAC is important to the cluster for tasks like moving data from L2 Cache into the L1 TCDM, so I'll be careful dealing with it.
Now the DMA engine provides a per-core interface, iirc.
I know little about this iirc interface, and don't find it in the code. I wonder where I can have a look at this interface? Maybe I can make some modifcation to the core_demux
used by core_region
to add a new iirc interface, so the core can access the DMAC directly.
I really appreciate your reply and it helps a lot!