mtd_spi_nor | littlefs: improve reliability with corrupted flash
Contribution description
This PR improves the reliability of mtd_spi_nor and littlefs:
mtd_spi_nor: properly checks forWELenabling/disabling, check for errors when writing/erasing and return a dedicated error code (-EIO).littlefs: catch-EIOand translate it intoLFS_ERR_CORRUPTthat is internally used to realloc sectors if needed
Testing procedure
Check that littlefs(2) still works (examples/filesystem).
This has been tested with flash that had dead sectors. littlefs could detect them and recover.
What's the deal with the Security register and will it be available on all flash devices?
On macronix flashes (at least, I haven't check other vendors actually :/ ), security register contains some flags: Erase fail bit and Program fail bit that are set when erase or program failed (usually meaning the sector is dead). See, for instance (page 83):
https://www.macronix.com/Lists/Datasheet/Attachments/7574/MX25L51245G,%203V,%20512Mb,%20v1.3.pdf
Hm I couldn't find such flags on neither GD25Q32C nor AT25SF041
d'oh! I'll revisit the patch to enable this with a compile flag
Is there some way to detect this at run-time?
I guess it could be possible to read the jedec ID. But I'm not sure it's worth it.
ping @benpicco
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.
@benpicco Can we get this reactivated? The project I'm currently working on uses an ISSI flash which has the same security features (however the bit positions in the registers are different and the OpCodes are different as well). This PR would be a good base and reduce the amount of work significantly.
Given the age of the PR, it probably has merge conflicts now. Unfortunately it looks like @vincent-d is not active anymore, his last GitHub activity was in 2022 and there is not much to find about OTAkeys anymore either.
On a different note: I checked the code and there is no timeout for the added functions wait_for_write_enable_cleared and wait_for_write_enable_set, which would get the thread stuck if the chip does not answer. (The function wait_for_write_complete which existed before and still exists in the current code does not have a timeout either, it counts the attempts and how many times it yielded. But there is no timeout either.)
I don't know how this is handled generally in other modules?
Sure, feel free to take over the PR. There should of course be a timeout, we don't want block the application in case of a hardware error but instead give to opportunity to handle it gracefully.