cva6
cva6 copied to clipboard
Bypassing the cache - confused on physical address generation
I'm interested in bypassing the cache subsystem (the data cache) in an effort to communicate between the CPU and the main memory directly. However, I'm running into issues regarding the physical addressing between the two.
I'm having real difficulty understanding how the lower order bits of the CPU request's physical address need to be converted before being sent out to the AXI adapter (with the existing cache subsystem). I've collected instruction trace information of the CVA6 running a benchmark program, grouping the memory request type, the request from the Load/Store unit, and the input to the AXI adapter (memory request). I can't make heads or tails of why (and how) the lower 4 bits change:
request #, request type, cpu request physical address, memory request physical address 1, sb, 800290bf (...1111), 800290bf (...1111) 2, sb, 800290be (...1110), 800290bd (...1101) 3, sb, 800290bd (...1101), 800290bb (...1011) 4, sb, 800290bc (...1100), 800290bc (...1100) 5, sb, 800290bb (...1011), 800290b7 (...0111) 6, sb, 800290ba (...1010), 800290b8 (...1000) 7, sb, 800290b9 (...1001), 800290be (...1110) 8, sb, 800290b8 (...1000), 800290b4 (...0100) 9, sb, 800290b7 (...0111), 800290b9 (...1001) 10, sb, 800290b6 (...0110), 800290b5 (...0101) 11, sb, 800290b5 (...0101), 800290ba (...1010) 12, sb, 800290b4 (...0100), 800290b6 (...0110) ..
I think my confusion comes from not really understanding the byte addressing format of the CPU/AXI protocol. I've thought through many possibilities but am still unable to spot the pattern or the rationale behind the input/output addresses.
I've tried going through CVA6's source code, investigating where the memory request physical addresses are generated or used to better understand the data transactions. For example, here, where depending on the 3rd bit of the memory request's physical address, we rearrange the 32 bit data coming from the LSU. Or here, where it seems that the last 3 bits of the physical address correspond to a write byte enable for the 64 bit AXI bus. Or here where the write buffer sets the paddr bits... that last one seemed promising, so I followed the wbuffer_dirty_mux.wtag and bdirty_off signals down the rabbit hole, but their assignment was a bit hard to follow as they are connected to the Round Robin arbiter and also a leading zero counter, which are connected to other signals, and so on...
I've attached a simulation result which illustrates the above store requests.
To summarize, I'd really appreciate some guidance on how the physical addresses need to be set in order to properly bypass the cache. I'm getting rather lost within the code. Thank you so much!
