Facilitate RP2040 XIP-cache-as-RAM feature, reclaiming bus bandwidth for time critical instructions in COPY_TO_RAM and NO_FLASH builds
The pico-sdk and RP2040 hardware provide a few facilities that improve performance by moving runtime code and data into SRAM:
- "pico/platform/sections.h" currently provides the "__not_in_flash", "__not_in_flash_func", and "__time_critical_func" macros for placing runtime code and data into SRAM by assigning them linker section names in the source code.
- The pico-sdk CMake scripts allow any of four binary types to be selected with similarly named project properties for the RP2040: "default", "blocked_ram", "copy_to_ram", or "no_flash"
- The RP2040's eXecute-In-Place (XIP) cache has its own connection to the main AHB bus and provides SRAM speeds on cache hits when retrieving runtime code and data from flash
But this regime isn't perfect. The 16kB of XIP cache and its connection to the main AHB bus go mostly unused for PICO_COPY_TO_RAM and PICO_NO_FLASH binary type builds, leaving some performance opportunities unrealized in their implementations.
The RP2040's XIP cache can be disabled by clearing its CTRL.EN bit which allows its 16kB of memory to be used as SRAM directly.
I'd like to update the pico-sdk to support the following:
- Use the "__time_critical_func" macro to place runtime code into XIP RAM when it's available
- Add a couple new "copy_to_ram_using_xip_ram" and "no_flash_using_xip_ram" binary type builds for the RP2040 that support this XIP RAM code placement
- Add a new "PICO_USE_XIP_CACHE_AS_RAM" CMake property to enable the XIP cache's use as RAM for time critical instructions with PICO_COPY_TO_RAM and PICO_NO_FLASH binary type builds
- Add a couple new CMake functions "pico_sections_not_in_flash(TARGET [list_of_sources])" and "pico_sections_time_critical(TARGET [list_of_sources])" that target selected source file content or a whole CMake build target's list of source files for placement into RAM and/or XIP RAM
I've put together a PR that achieves these 4 goals. I've only tested it with CMake based builds on the RP2040 hardware that I have. That said, I have made an effort to fail fast when CMake configuration properties are incompatible, and I've also made an effort to stay compatible with preexisting linker scripts by building on the preexisting section naming conventions.
Please consider my PR and let me know if my changes are acceptable, or what I can/should change to make them acceptable.
Thanks
Haven't thought about this in detail, but two things
- I wonder if we really need new build types; a single orthogonal flag which changes the behavior of the macros might be fine )assuming the same linker script is indeed OK - with possibly empty sections)
- we obviously need to think a bit more about how this plays with various RP2350 XIP SRAM boot features
Haven't thought about this in detail, but two things
- I wonder if we really need new build types; a single orthogonal flag which changes the behavior of the macros might be fine )assuming the same linker script is indeed OK - with possibly empty sections)
- we obviously need to think a bit more about how this plays with various RP2350 XIP SRAM boot features
My inspiration for the proposed changes here is that there can be contention between the CPU instruction fetches and the CPU data fetches/writes. In the RP2040's "default" and "blocked_ram" binary type builds the CPU instruction fetches go to the XIP cache, so the instruction+data contention doesn't happen there, but in the RP2040's "copy_to_ram" and "no_flash" binary type builds the CPU instruction fetches share the interleaved four main RAM bus connections with the CPU data fetches/writes, so the instruction+data contention can happen there.
Meanwhile the RP2040's datasheet clearly contemplates repurposing the XIP cache as an SRAM bank in multiple places. Following up on that hardware design with some pico-sdk support to utilize the XIP cache as dedicated RAM for "time critical" instructions seemed like a natural fit.
My PR introduces a new flag "PICO_USE_XIP_CACHE_AS_RAM" which is only applicable to the RP2040's "copy_to_ram" and "no_flash" binary type builds. So conceptually for the developer, I'm not really introducing new build types. Setting this property will place all ".time_critical.text*" linker section CPU instructions into the XIP RAM region, so the instruction+data contention can be avoided for these instructions within the "copy_to_ram" and "no_flash" binary type builds.
I didn't find any #ifdef style macros in the existing linker scripts. I assume they're not applicable there? Anyway, that's why I introduced the two new RP2040 specific linker scripts "memmap_copy_to_ram_using_xip_ram.ld" and "memmap_no_flash_using_xip_ram.ld" with my implementation. These were copies of the existing RP2040 specific linker scripts "memmap_copy_to_ram.ld" and "memmap_no_flash.ld" before I added the XIP_RAM support to them. Note: I was able to change behavior based on the "PICO_USE_XIP_CACHE_AS_RAM" flag using macros in the supporting code for the "crt0.S" and the "boot_stage2" scripts.
As I mentioned, my changes are limited in scope to RP2040 project builds. There's been a lot of hardware design work that went into the RP2350 around its XIP cache and its XIP bus connections. According to the RP2350's datasheet "Cache-as-SRAM is now achieved by pinning cache lines within the cached XIP address space". So using its XIP cache as RAM is a whole different proposition. Maybe a separate "PICO_PIN_TIME_CRITICAL_INSTRUCTIONS_IN_XIP_CACHE" feature would make sense in the future for the RP2350's "copy_to_ram" and "no_flash" binary type builds? For now, I believe my PR sidesteps any need to contemplate RP2350 XIP interactions by remaining an RP2040-only feature.
A couple notes from me:
- We wouldn't want to add this now, only to have to change it all when we add RP2350 support for the same thing - it would be much better to add them both at the same time
- Instead of adding new linker scripts, this seems like a prime candidate for extending my https://github.com/raspberrypi/pico-sdk/pull/2137 PR once that's merged, which adds template linker scripts - this would allow adding new sections without creating whole new linker scripts
- The pico_standard_link CMake modifications currently don't work if I just use
pico_set_binary_type(<my_exe> copy_to_ram_using_xip_ram)- it also throws an error thatPICO_USE_XIP_CACHE_AS_RAMis being redefined on the command line when I setPICO_DEFAULT_BINARY_TYPE=copy_to_ram_using_xip_ram, and doesn't seem to work then either - so it currently only works if you set bothPICO_USE_XIP_CACHE_AS_RAMandPICO_NO_FLASH/PICO_COPY_TO_RAM
I'm already working on some XIP SRAM stuff for RP2350, so I think I'll add this to that scope of work
I've popped my work so far as a draft in #2660 if you want to give it a try - it should work like your PR for RP2040 (minus the boot2 changes which I don't think are necessary), you just need to use pico_set_binary_type(<my_exe> copy_to_ram_using_xip_cache) rather than setting the PICO_USE_XIP_CACHE_AS_RAM variable
- We wouldn't want to add this now, only to have to change it all when we add RP2350 support for the same thing - it would be much better to add them both at the same time
The RP2350 drops the XIP-cache-as-RAM feature, so there is no cache-as-SRAM address window, and there is no XIP_CTRL_EN bit that will enable/disable it. I believe this precludes us from ever adding RP2350 support for "the same thing" in the future.
For this reason, I've purposefully isolated my changes in the PR to only affect RP2040 builds. When and if we wanted to pursue pinning time critical instructions into the RP2350's XIP-cache, it's going to require a new implementation distinct from what I've done here using the XIP-cache-as-RAM feature in RP2040 builds.
I really hope this "don't add now, if we may need to change it later" argument doesn't carry much weight, as it seems moot in the context of the XIP-cache-as-RAM feature. I'd prefer that this goes in as a RP2040-only feature sooner, rather than waiting for some arbitrary date in the future when some kind of related RP2350 pinning solution has been developed and is ready to ship.
- Instead of adding new linker scripts, this seems like a prime candidate for extending my Add pico_set_modified_binary_type function #2137 PR once that's merged, which adds template linker scripts - this would allow adding new sections without creating whole new linker scripts
I'll take a look.
Template linker scripts sound great. I'd be happy to refactor my PR to use those instead of adding my two new linker scripts. Do you have any insight on when they will they be merged into the develop branch? Also, are they going in as a separate feature or are they bound to your new pico_set_modified_binary_type function implementation?
- The pico_standard_link CMake modifications currently don't work if I just use
pico_set_binary_type(<my_exe> copy_to_ram_using_xip_ram)- it also throws an error thatPICO_USE_XIP_CACHE_AS_RAMis being redefined on the command line when I setPICO_DEFAULT_BINARY_TYPE=copy_to_ram_using_xip_ram, and doesn't seem to work then either - so it currently only works if you set bothPICO_USE_XIP_CACHE_AS_RAMandPICO_NO_FLASH/PICO_COPY_TO_RAM
You're right! I've now updated my PR with fixes and have verified that the PICO_USE_XIP_CACHE_AS_RAM is working as expected without errors in the following cmake config constellations:
set(PICO_COPY_TO_RAM 1)
set(PICO_COPY_TO_RAM 1)
set(PICO_USE_XIP_CACHE_AS_RAM 1)
set(PICO_DEFAULT_BINARY_TYPE copy_to_ram)
set(PICO_DEFAULT_BINARY_TYPE copy_to_ram_using_xip_ram)
pico_set_binary_type(${TARGET} copy_to_ram)
pico_set_binary_type(${TARGET} copy_to_ram_using_xip_ram)
set(PICO_NO_FLASH 1)
set(PICO_NO_FLASH 1)
set(PICO_USE_XIP_CACHE_AS_RAM 1)
set(PICO_DEFAULT_BINARY_TYPE no_flash)
set(PICO_DEFAULT_BINARY_TYPE no_flash_using_xip_ram)
pico_set_binary_type(${TARGET} no_flash)
pico_set_binary_type(${TARGET} no_flash_using_xip_ram)
I'm already working on some XIP SRAM stuff for RP2350, so I think I'll add this to that scope of work
That would be great! I'd really appreciate you driving this feature!
I've popped my work so far as a draft in #2660 if you want to give it a try - it should work like your PR for RP2040 (minus the boot2 changes which I don't think are necessary), you just need to use
pico_set_binary_type(<my_exe> copy_to_ram_using_xip_cache)rather than setting thePICO_USE_XIP_CACHE_AS_RAMvariable
I'll take a look. Thanks!
I've popped my work so far as a draft in #2660 if you want to give it a try - it should work like your PR for RP2040 (minus the boot2 changes which I don't think are necessary), you just need to use
pico_set_binary_type(<my_exe> copy_to_ram_using_xip_cache)rather than setting thePICO_USE_XIP_CACHE_AS_RAMvariableI'll take a look. Thanks!
Hi @will-v-pi ,
I've tried out your #2660 changes with my test program on my RP2040 using the following configurations:
pico_set_binary_type(main copy_to_ram_using_xip_ram)
pico_set_binary_type(main copy_to_ram)
pico_set_binary_type(main no_flash_using_xip_ram)
pico_set_binary_type(main no_flash)
I ran into some issues while deploying in some of these configurations. Please see my comments on your #2660 PR for ideas on how to fix them. Those proposed fixes got most of these configurations working for me.
That said, I do have one remaining unresolved issue which may require revisiting those boot2 changes. When I use the pico_set_binary_type(main copy_to_ram_using_xip_ram) configuration, my test app on my RP2040 is reporting a xip_ctrl_hw->ctrl value of 0x3 at runtime instead of the expected 0x2 value which would indicate the XIP_CTRL_EN_BIT has been disabled. For all the other configurations the expected value is being reported.
Do you have insight into why the crt0 code seems to have no effect there? I do see the _disable_xip_cache label being included in the .elf file so the code should be in there. Is that section of the crt0 not being run at all, or is the XIP_CTRL register being reset after the crt0 code is run?
Would you consider including the two CMake functions added in my src/rp2_common/pico_platform_sections/CMakeLists.txt changes?
Thanks again for all you effort, and looking into these changes!
That said, I do have one remaining unresolved issue which may require revisiting those boot2 changes. When I use the
pico_set_binary_type(main copy_to_ram_using_xip_ram)configuration, my test app on my RP2040 is reporting axip_ctrl_hw->ctrlvalue of0x3at runtime instead of the expected0x2value which would indicate the XIP_CTRL_EN_BIT has been disabled. For all the other configurations the expected value is being reported.Do you have insight into why the crt0 code seems to have no effect there? I do see the
_disable_xip_cachelabel being included in the .elf file so the code should be in there. Is that section of the crt0 not being run at all, or is the XIP_CTRL register being reset after the crt0 code is run?
Thanks for testing it - looks like the issue was flash functions calling boot2 to undo the change (which weren't called in my tests, but are called when you use stdio_usb), which were fixed by your boot2 changes, but those changes apply project-wide (as you get a boot2 per-project not per-target). I've implemented a different fix, to save/restore the register around flash functions, so it works per-target.
Would you consider including the two CMake functions added in my
src/rp2_common/pico_platform_sections/CMakeLists.txtchanges?
Those look like those could be useful functions, but I think I'll leave them out of my changes and you can PR them separately
Would you consider including the two CMake functions added in my
src/rp2_common/pico_platform_sections/CMakeLists.txtchanges?Those look like those could be useful functions, but I think I'll leave them out of my changes and you can PR them separately
Ok, I'll submit a separate PR for them. Thanks
Ok, I'll submit a separate PR for them. Thanks
PR submitted, see #2677