arduino-esp32
arduino-esp32 copied to clipboard
Bricked ESP32 Modules - XMC flash chip corrupt
Board
All boards theoretically
Device Description
Not relevant, but a custom PCB for my product
Hardware Configuration
not relevant, but yes
Version
v1.0.6
IDE Name
not relevant
Operating System
not relevant
Flash frequency
40MHz
PSRAM enabled
no
Upload speed
115200
Description
@espressif has officially closed https://github.com/espressif/esp-idf/issues/7994 and asked that we open new issues for "related problems". Well, the Arduino-ESP32 environment is just such a thing.
The summary of the problem is that some modules have flash chips from XMC that can get into a bad state and irreparably brick the modules (like they won't get past the second level bootloader and can't be brought back to life). The solution, or at least a solution is running a script using esptool.py to lock the flash status bits so that the problem can't happen. There was some solution endorsed by espressif in the IDF repository.
What we need here, I think, is a solution for Arduino-ESP32 to solve this at runtime, so that it can be deployed in an OTA update. Maybe it should be "built in" transparently to the core build before main runs or something, but I suspect an opt-in strategy would cause less drama to the project stakeholders as a whole. I'm thinking, at a bare minimum, we should have an example sketch that demonstrates the ability to inoculate the flash chip at runtime, if it's not already inoculated, as the esptool.py script in the parent issue does. I realize the project is now on the 2.0.x release, but it would be really wonderful if a back port point release could be made available to the 1.0.x package as well, for those of us who are not up to the latest and greatest.
I want to emphasize that this is a pretty serious issue for commercial users of this project, and if there's any way I can be put to work helping to solve the problem, I am ready to put time into it. We've already had a bunch of returns, confirmed because of this issue, and more showing up weekly. I absolutely need guidance and support though to get it done right though. I don't fully understand the problem or obstacles. Maybe @igrr can weigh in and offer some support?
Sketch
not relevant
Debug Message
not relevant, doesn't start user application when flash is corrupted and locked out
Other Steps to Reproduce
XMC flash chips are susceptible to this, not all ESP32 modules have XMC flash chips installed
I have checked existing issues, online documentation and the Troubleshooting Guide
- [X] I confirm I have checked existing issues, online documentation and Troubleshooting guide.
Just stumbled into these issues (not that I've experienced them personally so far)...but one observation I do have, is that any sort of boot-level access to the FLASH chip kinda has to go through the bootloader level (or just brute-force bit-banging).
On the ESP8266, there were several exposed FLASH chip status/ID functions (https://github.com/esp8266/Arduino/blob/ee7ac2f79d4bbf5460bf1c60c58469ab5e3022b9/cores/esp8266/Esp.cpp#L277 and surrounding, for example) On the ESP32, none of those endpoints appear to be available. The few FLASH chip functions on ESP32 do not appear to actually access chip-level functionality--as much as simply reading the preconfigured settings for the binary. (https://github.com/espressif/arduino-esp32/blob/943216308d0425091498a33ba73e375cbd76287a/cores/esp32/Esp.cpp#L308 for example)
It's worth noting that the Arduino-ESP32 port does not have any support for bootloader interaction. All "bootloader calls" that I'm aware of are either removed and/or simply blank stubs that return predefined values. Unfortunately, that may make this sort of requirement very difficult. Necessary...but hard.
The biggest challenge is that the ESP32 has a code handler that reads required blocks from FLASH and runs the code therein. The only way to directly control the FLASH chip (as required for the patch) is to first copy code to IRAM (from what I can see in their examples) and then disable the FLASH handler so there aren't automated FLASH accesses that would mess up any attempt to patch the issue.
It may be possible, though difficult. Found this thread: https://esp32.com/viewtopic.php?f=14&t=1481 which has some pretty in-depth bit-bang-level FLASH chip access, so it's definitely possible to shoehorn into the Arduino port...
Interestingly, mention was made of SPI_Common_Command...but somewhat comically, there is no implementation of this call in the Arduino-ESP32 library for the ESP32. There are implementations for the ESP32-S2 and ESP32-S3, though...
I did find this call, however: https://github.com/espressif/arduino-esp32/blob/371f382db7dd36c470bb2669b222adf0a497600d/tools/sdk/esp32/include/spi_flash/include/esp_flash.h#L145 might be able to use that to figure out whether the modules have the XMC chips in them or not.
Finding the brand of the chip is absolutely not a problem. It is just 1 byte of the flash ID (3 bytes)
This is what I use in my code (ESPEasy) to get the same byte order as on the ESP8266:
uint32_t getFlashChipId() {
// Cache since size does not change
static uint32_t flashChipId = 0;
if (flashChipId == 0) {
#ifdef ESP32
uint32_t tmp = g_rom_flashchip.device_id;
for (int i = 0; i < 3; ++i) {
flashChipId = flashChipId << 8;
flashChipId |= (tmp & 0xFF);
tmp = tmp >> 8;
}
// esp_flash_read_id(nullptr, &flashChipId);
#elif defined(ESP8266)
flashChipId = ESP.getFlashChipId();
#endif // ifdef ESP32
}
return flashChipId;
}
The Vendor ID is the last byte: getFlashChipId() & 0xFF
@TD-er Well, that was easy to get...any thoughts about submitting something like that as a PR to the Arduino-ESP32 library??
Obviously, as an ESP32-specific PR, there is no need for ESP8266 compiler checks. I'd suggest some efficiency (memory and code) tweaks...something like this?
uint32_t getFlashChipId() {
uint32_t tmp = g_rom_flashchip.device_id;
return (tmp & 0x000000FF) << 16 | (tmp & 0x0000FF00) | (tmp & 0x00FF0000) >> 16;
}
Technically, the whole thing could be implemented as a single line....no need for an intermediate "tmp" variable. Reading a structure member isn't the same as calling a function!
From what I can find online, the "Vendor ID" for XMC is 0x20. According to this website, that's the same as an ST25-series FLASH chip: https://www.mail-archive.com/[email protected]/msg14313.html
According to the ESP-IDF issue, the most problematic chip ID is 0x204018 (or reversed per this function, 0x184020):
// if we have an XMC flash chip susceptible to status register corruption,
// flash the status registers properly and lock them afterwards.
if( g_rom_flashchip.device_id == 0x204018 ){
Being able to identify the problematic chips is a huge start...
@igrr any progress on this? Sadly, I'm still getting several returns per week associated with this failure mode.
@igrr, hope you are doing well. This issue has been "Awaiting triage" for over two months now... what's the word?
Hello @vicatcu, TBH we didn't make any investigation on this. I'm moving this to the top of backlog and we will take a look.
@vicatcu could you try with the latest 2.0.5 release?
@me-no-dev sorry, I hadn't noticed the activity on this issue... try what exactly? It's not something I can easily reproduce and once it happens it's terminal.
Actual Arduino core (Iirc since 2.0.0) uses IDF 4.4 as Base. Since that time there is support for XMC flash chips and no chip corruption should happen anymore There is no solution for core before 2.0.x. To solve you have to reflash. OTA is not enough since the bootloader without XMC flash chip support is NOT replaced during an OTA update.