litex icon indicating copy to clipboard operation
litex copied to clipboard

Stackoverflow on small SRAM when trying to boot from SD card

Open fritzbauer opened this issue 3 years ago • 0 comments

Issue

When using a small SRAM of 2048 bytes it is not possible to boot from SD card. The boot process will just get stuck without any error output. After spending hours of debugging with OpenOCD I was able to identify that a Stackoverflow occurs when entering sdcardboot_from_json and incrementing the Stackpointer by 2032. For instance the current implementation initializes a buffer of 1024 bytes for the content of boot.json on the stack: boot.c-->sdcardboot_from_json()

Why such a small 2048 bytes SRAM?

An FPGAs nerd biggest struggle: The chip does not fit more. When building linux-on-litex-vexriscv some chips need such a small sram in order to synthesize successfully. e.g. Qmtech_EP4CE15 or in my case a Qmtech_5CEFA2

Ideas to fix

  • At least, having a specific error message about this problem when building might save others many hours of troubleshooting
  • I tried to modify sdcardboot_from_json() and just store the large objects in the main_ram, since it has plenty of space and is supposed to be unused until copy_file_from_sdcard_to_ram() is called for the first time: https://github.com/fritzbauer/litex/blob/e20596c8f774bf84bd6c6a957caff8abbbe01f33/litex/soc/software/bios/boot.c#L714
    • However, copy_file_from_sdcard_to_ram() is huge es well, currently consuming 624 bytes plus the stack variables added by child functions like f_mount, f_open, printf, etc. So, in addition to that change I had to free some storage by removing the .data section (696 bytes) from the sram and placing it in the main_ram. This way only .bss (568 bytes) is left in the sram, so it appears that this is enough for the stack to grow (or the stack overwrites only unused data...).

My favorite workaround:

Adjust bios/linker.ld to place the _fstack in the main_ram instead of sram. I reckon this is very ugly and this change comes with some risk to be overwritten by other data written to the ram, which is not aware of this "hack". Some considerations:

  • It is working (unlike the previous implementation)
  • Since it is a relatively small stack compared to the overall memory size, it is unlikely that the last <8192 bytes of the RAM will already be overwritten before booting the actual system.
  • Not sure, if for every design there is main_ram available?

Since I am not a litex (nor C) expert I am leaving it to you to find a better approach. Probably it is also possible to conditionally use sram/main_ram depending on the size of sram?

More background information about this issue can be found here: https://github.com/litex-hub/linux-on-litex-vexriscv/issues/287

fritzbauer avatar Jun 23 '22 11:06 fritzbauer