CFU-Playground Flexibility in modifying i and d-cache configs.

Flexibility in modifying i and d-cache configs.

Open bala122 opened this issue 1 year ago • 7 comments

Hi @tcal-x and @mithro , I was trying to change certain parameters of the d-cache on the Vexriscv using the scala file (). From the parameters, I can gather that in a given 'way'( or column ) there has to be 4096 bytes ( looking at the formula for number of ways written in terms of dcache/ icache size). Is there any way to change this? Because it could be useful to see finer granularity in changes of d-cache size to observe performance benefits. As of now, I think we can only increment the size in multiples of 4096 bytes.

On another note, is there any way to change L2 cache parameters as well, and is it a common L2 cache or separate for Imem and dmem.

Please let me know about this soon, Thanks, Bala.

Sep 07 '22 14:09 bala122

Hi Bala, We have been using many sizes of L1 caches, both less and more than 4096. There was an issue related to size of each "way" in MMU-enabled VexRiscv if I recall, but that doesn't come into play with CFU playground. One thing to be aware of, especially for dcache I think, is that you may get to the point where reducing the cache size doesn't reduce the number of BRAMs since you're already at the minimal # of BRAMs. With Dcache, to implement the byte write enables, they need a separate BRAM for each byte column. There may be a way to instead use the FPGA primitive's byte write enables if present, but we don't currently do that in CFU Playground.

Anyway, it is quite easy to change the L1 dcache/icache sizes by changing the provided arguments to the scala script; see for example:

https://github.com/google/CFU-Playground/blob/main/soc/vexriscv/Makefile#L39

The L2 cache is not part of VexRiscv; on Arty A7, it is part of the DRAM controller, and it looks as though the size can be changed using a LiteX argument: https://github.com/litex-hub/litex-boards/blob/master/litex_boards/targets/digilent_arty.py#L100-L110

In CFU Playground, you can hack the L2 size by changing the value on this line if you're using Digilent Arty A7:

https://github.com/google/CFU-Playground/blob/main/soc/board_specific_workflows/digilent_arty.py#L64

I'm not sure if there's a better way to change it. Please remind me, what board are you using?

Sep 12 '22 07:09 tcal-x

Sure, @tcal-x , Thanks for the update. Yes, I'm using the same Digilent Arty A7 board. Also, I've recently encountered an issue where even though I'm increasing my D-cache size (from 4KB to say 32 or 64KB), I'm not seeing reduction in runtimes due to high cache sizes. I've even ruled out instruction cache misses (by increasing the I-cache size a lot / by even trying to fit in the full code in a block). I'm assuming the issue might be related to the bottleneck at the L2 cache whose size is still at 8KB. I wanted to ask if this L2 cache is common for instructions and data, hence causing the bottleneck (or given the fact that its still small- ie, 8KB in size)

Also, I wanted to ask about the paging feature- is it actually implemented here in the Arty A7? As I understand paging is used with some virtual addressing requiring a secondary external memory (the Arty A7 has DDR memory which is for DRAM (or primary memory) if Im not wrong). So, does it really make a difference if I use the "linux" keyword or not in the "csrPluginConfig" in the Makefile scala script.

If you could give a complete list of the memory hierarchy ( both instr. and data), that would be helpful. I had seen "SRAM" come up along with other components like L2, etc., but I dont know where it fits beyond L2. Normally, SRAM itself refers to cache (l1,l2), then we have DRAM (or normally RAM), and then finally external memory.

Please let me know about this soon, Thanks, Bala.

Sep 13 '22 09:09 bala122

Hi Bala,

Well, even if L2 is small, a large L1D cache should still get you benefits if indeed you are seeing L1D misses that can be reduced by having a larger L1D capacity. It may be that L1D misses are not the performance issue, in which case increasing L1D will not help.

I am working on a PR to make it easier to change L2 cache size, via the build line. I'm hoping for a reply from the original author regarding why they fixed L2 size at 8kB. I did do experiments with larger L2 cache size and it did work correctly and did improve performance. For now you can do experiments by editing the line I mentioned above, https://github.com/google/CFU-Playground/blob/main/soc/board_specific_workflows/digilent_arty.py#L64 .

And yes, to answer your question, both L1D cache and L1I cache go through the same L2 cache.

And for the other question, about VM / MMU / Linux --- I have not experimented with these but only because I have not found the time! They should work, but I can't promise for sure. Of course once you have a situation with multiple processes running, you must be careful about CFU usage if the CFU is stateful. If there is just one process that uses the CFU, then you don't need to be concerned so much.

Sep 13 '22 15:09 tcal-x

Thanks for the responses @tcal-x , I just wanted to confirm/ask certain things from you and @mithro :

As I understand, now, the memory hierarchy is as follows: L1 (D and I ) - L2 (shared) - DRAM with its data buffers (or SDRAM)
What is the organization of the L2 cache- ie, mainly block size, number of ways, sets, etc. I wanted to know this mainly because I wanted to change the L1 Block size and some of my buffer sizes accordingly.
If I have the csrPluginConfig in the Vexriscv scala script as "all" - does this mean I'm not using paging , and if I have it as "linux"- I'm using paging. On a sidenote, how does paging work here if the memory hierarchy ends at DRAM/SDRAM, ie with no external memory. Is some part reserved on the DRAM separately as "external" Thanks, Bala.

Sep 19 '22 05:09 bala122

Hi @bala122 ,

CSRs are the RISCV control-status registers. I don't think 'all' contains the ones necessary for Linux. The relevant source config is here: https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/plugin/CsrPlugin.scala (referenced by https://github.com/google/CFU-Playground/blob/main/soc/vexriscv/src/main/scala/vexriscv/GenCoreDefault.scala#L219-L228).

I was going to say, "I doubt switching the CSR config to Linux is enough to create a Linux-capable VexRiscv", but then looking at that scala script, it does seem to add a lot of other stuff when you specify "linux" or "linux-minimal" as the CSR config: https://github.com/google/CFU-Playground/blob/main/soc/vexriscv/src/main/scala/vexriscv/GenCoreDefault.scala#L106

You can compare with this project to see which VexRiscv configuration they use: https://github.com/litex-hub/linux-on-litex-vexriscv

L2 configuration is a LiteX / Litex-boards topic; in this area, I just use what's provided. You can look here to get started understanding how it's configured:
https://github.com/enjoy-digital/litex/blob/master/litex/soc/integration/soc.py#L1578-L1593 and https://github.com/enjoy-digital/litex/blob/master/litex/soc/interconnect/wishbone.py#L521

Sep 22 '22 06:09 tcal-x

@bala122 I should have mentioned that for LiteX issues/discussion, there is a #litex IRC channel on libera.chat. It may be bridged to other chat platforms but I'm not sure; I use IRC client irccloud.com to access it.

It also has searchable logs at https://libera.irclog.whitequark.org/litex .

Sep 23 '22 23:09 tcal-x

Thanks for that @tcal-x ! Regarding csrplugin config, I meant that the scala file pointed towards some parameters about Mmu configs when the "linux" keyword is used. So, I was wondering if some sort of virtual addressing and paging was done there. Hence, if we don't use the "linux" keyword, I assumed no paging.

Thanks for the links on L2 caches. I'll look into them. Additionally, I just wanted to ask, what kind of cache architecture are we considering?- inclusive or exclusive? I'm guessing it's exclusive since we can increase the L1 size beyond L2 as well, and it shows no errors or warnings. Thanks, Bala.

Sep 26 '22 04:09 bala122

CFU-Playground CFU-Playground copied to clipboard

Flexibility in modifying i and d-cache configs.

CFU-Playground
CFU-Playground copied to clipboard