esp-hal icon indicating copy to clipboard operation
esp-hal copied to clipboard

Support XiP from PSRAM

Open Dominaezzz opened this issue 1 year ago • 1 comments

https://docs.espressif.com/projects/esp-idf/en/stable/esp32s3/api-guides/external-ram.html#execute-in-place-xip-from-psram

CONFIG_SPIRAM_FETCH_INSTRUCTIONS and CONFIG_SPIRAM_RODATA are the specific esp-idf configs.

In short, this feature lets you copy the application and data from flash to PSRAM at start up, which means flash is no longer needed for the rest of your application.

This is important for #2081 where you want to store huge buffers in PSRAM and DMA them to a peripheral, but you don't want flash access temporarily disabling psram access, leading to the DMA being starved of data and the peripheral getting garbage when it inevitably goes too far ahead.

Somewhat related #1083 .

Dominaezzz avatar Sep 04 '24 17:09 Dominaezzz

Turns out at least on ESP32-S3 we can execute code from PSRAM right after initializing PSRAM as we currently do - it's just that it's not writable via ibus (but we can via dbus and need to make sure to synchonize caches)

I think there are two ways

  • the original request from #1083 extended by linking specific functions into PSRAM - for this we would need to create suitable sections in the linker scripts, copy code and data to PSRAM before anyone is allowed to access it. One challenge might be that currently we map PSRAM right after FLASH so the linker script needs to calculate the start address accordingly
  • the way ESP-IDF does it (i.e. copy everything to PSRAM and then map PSRAM at the origin of external address space) - see https://github.com/espressif/esp-idf/blob/3c99557eeea4e0945e77aabac672fbef52294d54/components/esp_psram/mmu_psram_flash.c#L46-L134

bjoernQ avatar Sep 17 '24 06:09 bjoernQ

I'd like to place my vote in favor of implementing this issue and explain the rationale.

If I understand this right, this feature is practically required for applications that need to use large RGB displays (where buffer is in PSRAM) while at the same time using PSRAM intensively for application use.

Well, for complex applications, with user interface, using WiFi, SSL, async, etc. the limited amount of RAM on esp32 is quickly exhausted and PSRAM is required for memory allocations. And for large displays (over 3.5") RGB is mandatory and required PSRAM memory for buffering.

There are many applications of this sort out there, and currently they are all written in C++ and out of the reach of rust developers.

Therefore, to enter the space of UI based applications, this feature is a must. I think that would be an effort wisely invested.

And to sum up: "That's one small step for esp-hal, one giant leap for rustaceans."

yanshay avatar Dec 23 '24 20:12 yanshay

The simplest thing to try I can think of would be:

  • map PSRAM as we already do, FLASH is still mapped
  • just memcopy everything from the mapped flash to psram
  • map psram to the origin of extram, invalidate caches

I think ESP-IDF directly reads flash / writes psram via SPI but just memcpy the mapped flash/psram should be easier (no code to port from ESP-IDF)

I haven't tried that (since then we would already have it :) ) but I guess it could work this way. All necessary changes would be in https://github.com/esp-rs/esp-hal/blob/4b66e3dba736f30ca370f670a8d3e1012fde50a3/esp-hal/src/soc/esp32s3/psram.rs#L116-L234

bjoernQ avatar Jan 16 '25 07:01 bjoernQ

The recent 38C3 talk on liberating the WiFi on the ESP32 mentioned that reverse-engineering the ROM is allowed by espressif. So, after a bit of reverse engineering work and studying the IDF sources I've gotten XiP from PSRAM working on my ESP32S3.

https://gist.github.com/EliteTK/5a409431082b4a4c34bb560243f2cf61

The code is ported from the equivalent ESP-IDF with any excess fluff removed. The comments contain the reverse engineered, cleaned up, mostly valid C which represents what the Cache_ functions which get called are actually doing.

I haven't actually tested this with a display yet as I never tried writing the DMA from PSRAM code mentioned in the related issues because I knew (from experience with esp-idf-hal, before I enabled XiP from PSRAM) that it would be glitchy and unusable. But I hope to do that tomorrow or next week.

I think it would make sense to put this in init_psram with additional PsramConfig fields for specifying that you want this feature. Putting the .text and .rodata in PSRAM has an effect on the amount of remaining available PSRAM which is another reason why I think it make sense to do this there. I plan on putting together a draft PR to this effect.

The code for doing this for the ESP32-S2 and ESP32-S3 is very similar. There's a v2 file in the IDF which handles ESP32-P4. I don't actually know enough about ESP32 to know if that's the extent of SOCs which support this feature.

Lastly, I have minimal experience with the ESP32 and esp-hal, limited experience with embedded rust, limited experience with unsafe rust, and only about a year of proper rust experience so please do comment on the gist on anything you think looks odd/could be improved, I'll happily incorporate it as I keep working on this.

EliteTK avatar Jan 19 '25 00:01 EliteTK

Wow - didn't notice there is a Cache_Flash_To_SPIRAM_Copy ROM function - nice

bjoernQ avatar Jan 20 '25 07:01 bjoernQ

Wow - didn't notice there is a Cache_Flash_To_SPIRAM_Copy ROM function - nice

Funny thing about it: It has a bug.

For some reason mappings to the 0 page in flash (and only the zero page) all get coalesced when doing the copy to PSRAM (someone who knows more about ESP32 or maybe the IDF would possibly know why specifically the zero page, I would love to know), so that once everything is copied to PSRAM, anything which previously mapped to the 0 page in flash should now map at a single page in PSRAM. But the way that the function handles this is incorrect, the zero-page address is one higher than it should be when it's assigned to the in-out parameter which holds this information. This means that the first zero-page mapping will point at the copied zero page in PSRAM but any subsequent zero-page mapping will point to the next PSRAM page after the copied zero page.

I verified this by setting up the scenario and then calling the ROM function just in case my reading of the diassembly and ghidra's decompliation were both wildly wrong.

I could be wrong and this could be intentional (I don't see how), but I think for the esp-hal implementation I'll just re-create the function in rust but without the bug.

There's also the question of why the Cache_Flash_To_SPIRAM_Copy function is only used to copy half of the mapped memory (0-127 when calling it with CACHE_IBUS, and 128-255 when calling it with CACHE_DBUS). Presumably another artifact of the ESP-IDF that I don't understand.

It uses a spare MMU entry (511, the last one) to perform the copy by mapping subsequent PSRAM pages to that memory location and then memcpying the data. There is no problem when it only applies the copy to the first 256 mapped entries, but if for some reason the entire MMU table was filled with flash mappings and the weird "only the first 256 entries" thing was relaxed to cover the entire table, the last mapping would never get copied to PSRAM. But I guess at this point since your MMU table was full, you would have no reason to bother with XiP from PSRAM so maybe it's fine to error out in those cases (really, even if the page isn't copied, while incredibly unlikely, there's no guarantee that the code isn't currently executing from that memory area, which means that when the page is temporarily remapped, the code would crash, so guarding against this seems sensible anyway, properly fixing this would require running the setup code from RAM).

Lastly, I also found a bug in the ESP-IDF's workaround for a known ROM bug in the Cache_Count_Flash_Pages function, the function needs to account for the extra zero page mapping if it's the first time it's counting a zero page and then never again, hence the reason for the in-out parameter, but this incorrectly counts a zero page if the zero-page-count was zero even if the new zero-page-count is still zero (meaning that there is no zero page that needs accounting for yet). This means that if you have no zero pages mapped in either of the regions of the MMU table you will end up with an extra pages each instance of the call to the count function. Anyway, long story short, the patched version will over-compensate and under-report the page count in some scenarios.

So, all things considered, I think it would be clearer to just replicate the relatively simple Cache_Count_Flash_Pages in rust for the esp-hal version.

Really this whole thing has left me with just as many questions as I had before, except they're different questions, so I guess I learned something.

And I did now try running a framebuffer from PSRAM but seem to be struggling just getting the screen to display it remotely correctly, never mind displaying it without the kinds of glitches you see when you're running code from flash while the framebuffer is in PSRAM.

EliteTK avatar Jan 20 '25 12:01 EliteTK

I've now opened a WIP PR for this in #3024 . I would appreciate any and all feedback.

EliteTK avatar Jan 23 '25 20:01 EliteTK