[BUG] SD card issue: Assertion failure in stm32_sdmmc.c on STM32 due to timeout
Description / Steps to reproduce the issue
The easiest way to bring the SD card to get into a timeout is by doing:
- You might want to try with a non-industrial grade sd card (higher chances of a timeout happening, I would suppose)
- modify the value of
SDMMC_DTIMER_DATATIMEOUT_MSfrom nuttx/arch/arm/src/stm32h7/stm32_sdmmc.c to have the value 10, instead of the default 250 (ms) - Make sure that assertions are enabled
- You might want to enable error logs for filesystem and/or memory card
- I have SDIO and DMA enabled for the sd card
- Build NuttX -- arm 32 bit, stm32h7.
- Issue sd_stress or sd_bench, or some other I/O intensive operation on the sd card, for 10 minutes (Usually 2 minutes should suffice)
What happens: You get an assertion failure in stm32_sdmmc.c line 1108 (function stm32_dataconfig) because the STM32 sdmmc IDMA CTRLR register is enabled
On which OS does this issue occur?
[OS: Other]
What is the version of your OS?
NuttX 10.3 on STM32 board
NuttX Version
10.3+
Issue Architecture
[Arch: arm]
Issue Area
[Area: Specific Peripheral]
Host information
Host is STM32H7, with armv7-m. It's NuttX 10.3 with some fixes cherry-picked from later versions. Looking through the git history of NuttX 12, I didn't see changes in this area to suggest that this issue is fixed in NuttX 12.
Verification
- [x] I have verified before submitting the report.
@Feoggou isn't 10ms too low value for a SD Card timeout?
Other thing that is raising concern: are you using 100HZ for clock tick? This is exactly 10ms too, if so try to reduce the default value: CONFIG_USEC_PER_TICK=10000 to 1ms (1000).
@raiden00pl @keever50 some other idea?
@acassis I tried with the value of 10 miliseconds, which is the lowest value I could set (I think one clock tick is 10 miliseconds, so I can't get any lower) for the purpose of getting the timeout issue to reproduce easily.
In my setup, I had increased the timeout from 250 miliseconds to 5000 milseconds (5 seconds) (please see the markdown file I had attached), and the assertion failure happens like once in 24 hours of constant running of my application. I don't know why my sd card sometimes -- very rarely -- needs much more time than 250 ms, but the problem I'm facing is that if the timeout does happen then the problem is unrecoverable.
So the bug that I'm reporting is that the timeout issue is an unrecoverable situation.
@Feoggou normally SDCards and eMMC have an internal software running inside it and from time to time it needs to "clear the house" and reorganize/consolidate the blocks inside to optimize for space and speed access. Probably that is the moment when your SDCard requires more time. It depends how your application is using the SDCard, if you give it enough time between access, probably the reorganization will be done in this free time. If you access it all the time (for example for a log hungry function) then probably you will experience this issue that you are reporting.
Well, the application is doing a lot of logging, and the logs are being saved to sd card. If a crash happens in my application, then I need to know the last operations that got executed -- that's why I don't use buffering and write the log message immediately onto the sd card. So if there's an issue with the sd card becoming too overwhelmed with requests, I would want it to fail in a decent fashion -- such as, my write/fsync/close operation to return with error and set an errno code, rather than my entire application along with nuttx to crash.
@Feoggou yes, I'm not saying that the driver is right, just saying that the timeout could be caused that fact.
About: "I tried with the value of 10 miliseconds, which is the lowest value I could set (I think one clock tick is 10 miliseconds, so I can't get any lower)"
My suggestion is to use CONFIG_USEC_PER_TICK=1000 and keep the timeout as 10ms. But since you said the issue also happen with bigger timeout, then this proposal will not work.
Did you try to find a way to disable IDMA CTRLR when the TIMEOUT happens? That should helps to prevent the crash.
Well, I tried to set the register back to 0, but when reading it back it was still set. I've noticed in the STM32 documentation something about a Data Path State Machine Status Register, that when it has some status you cannot set the enable bit on the control register. I'll have to read more in the STM documentation, but chances are that the "stop transmission" command -- STM32_SDMMC_CMD_CMDSTOP -- that is currently issued is not enough, or perhaps even I might have to set some registers before using this.
I'll need to read more in the documentation.