Arduino
Arduino copied to clipboard
"Heap init code improvements" doesn't boot on ESP-01 with PUYA flash
Basic Infos
- [X] This issue complies with the issue POLICY doc.
- [X] I have read the documentation at readthedocs and the issue is not addressed there.
- [X] I have tested that the issue is present in current master branch (aka latest git).
- [X] I have searched the issue tracker for a similar issue.
- [X] If there is a stack dump, I have decoded it.
- [X] I have filled out all fields below.
Platform
- Hardware: [ESP-01]
- Core Version: [313b3c07ecccbe6fee24aa9fa447c4aed16ca499]
- Development Env: [Arduino IDE]
- Operating System: [Ubuntu]
Settings in IDE
- Module: [Generic ESP8266 Module]
- Flash Mode: [dio]
- Flash Size: [1MB]
- lwip Variant: [v2 Lower Memory]
- Reset Method: [nodemcu]
- Flash Frequency: [40Mhz]
- CPU Frequency: [80Mhz]
- Upload Using: [SERIAL]
- Upload Speed: [115200]
Problem Description
Any program compiled and uploaded using the latest git master does not run at all. After a successful compile and upload Arduino serial monitor only shows garbage (using a scope on the serial line shows data with a non-standard baudrate that I can't decode). The LED is constantly on.
Simplest sketch that reproduces this is the "Blink" example. However the result is the same regardless of what program is uploaded.
Using git bisect from a known-good release (2.4.2) shows that the commit that introduced this issue is:
7356cd1ef13c7a7da4631be01060f6f40806fa23 Heap init code improvements and updates (#8458)
Reverting just this commit on the latest git master fixes the problem.
MCVE Sketch
void setup() {
pinMode(LED_BUILTIN, OUTPUT);
}
void loop() {
digitalWrite(LED_BUILTIN, HIGH);
delay(1000);
digitalWrite(LED_BUILTIN, LOW);
delay(1000);
}
Debug Messages
Picked up JAVA_TOOL_OPTIONS:
. Variables and constants in RAM (global, static), used 27984 / 80192 bytes (34%)
║ SEGMENT BYTES DESCRIPTION
╠══ DATA 1496 initialized variables
╠══ RODATA 856 constants
╚══ BSS 25632 zeroed variables
. Instruction RAM (IRAM_ATTR, ICACHE_RAM_ATTR), used 59601 / 65536 bytes (90%)
║ SEGMENT BYTES DESCRIPTION
╠══ ICACHE 32768 reserved space for flash instruction cache
╚══ IRAM 26833 code in IRAM
. Code in flash (default, ICACHE_FLASH_ATTR), used 232952 / 1048576 bytes (22%)
║ SEGMENT BYTES DESCRIPTION
╚══ IROM 232952 code in flash
esptool.py v3.0
Serial port /dev/ttyUSB0
Connecting....
Chip is ESP8266EX
Features: WiFi
Crystal is 26MHz
MAC: [...]
Uploading stub...
Running stub...
Stub running...
Configuring flash size...
Auto-detected Flash size: 1MB
Erasing flash (this may take a while)...
Chip erase completed successfully in 0.0s
Flash params set to 0x0220
Compressed 266288 bytes to 196214...
Writing at 0x00000000... (8 %)
Writing at 0x00004000... (16 %)
Writing at 0x00008000... (25 %)
Writing at 0x0000c000... (33 %)
Writing at 0x00010000... (41 %)
Writing at 0x00014000... (50 %)
Writing at 0x00018000... (58 %)
Writing at 0x0001c000... (66 %)
Writing at 0x00020000... (75 %)
Writing at 0x00024000... (83 %)
Writing at 0x00028000... (91 %)
Writing at 0x0002c000... (100 %)
Wrote 266288 bytes (196214 compressed) at 0x00000000 in 20.1 seconds (effective 105.9 kbit/s)...
Hash of data verified.
Leaving...
Hard resetting via RTS pin...
Default baudrate is 74880 btw, you will see boot messages if you configure your monitor to it Is this commit affecting all of your devices or just one?
Thanks for the quick response!
Default baudrate is 74880 btw, you will see boot messages if you configure your monitor to it
Decoding at 74880 bps gives the following (repeating in an infinite loop):
�4R2RvVKE�R|�4R2RvVKE�R|�4R2RvVKE�R|�4R2RvVKE�R|�4R2RvVKE�R|�4R2RvVKE�R|�4R2RvVKE�R
Is this commit affecting all of your devices or just one?
I've now tried a few other ESP-01 modules and found one that works with the commit!
Both should have 1 MB Flash (25Q80).
One interesting difference I see between these two modules is time to erase the flash:
Good module:
Auto-detected Flash size: 1MB
Erasing flash (this may take a while)...
Chip erase completed successfully in 6.0s
Bad module:
Auto-detected Flash size: 1MB
Erasing flash (this may take a while)...
Chip erase completed successfully in 0.0s
In both cases Arduino was set to "Erase flash: All Flash Contents"
Usually, device prints something like this on boot
ets Jan 8 2013,rst cause:2, boot mode:(3,6)
load 0x4010f000, len 3424, room 16
tail 0
chksum 0x2e
load 0x3fff20b8, len 40, room 8
tail 0
chksum 0x2b
csum 0x2b
v00090700
~ld
...sketch starts, baudrate might change...
Not actually erasing flash seems suspect, and ours as well as esptool issue tracker mentions PUYA chips as source of all kinds of trouble. https://github.com/esp8266/Arduino/search?q=puya&type=issues https://github.com/espressif/esptool/search?q=puya&type=issues
While we have PUYA-specific workarounds for an already working sketch (edit: enabled by default), not really sure this is the expected behaviour when flashing or whether it actually writes any data to the flash. Looks like a hw issue, really.
Usually, device prints something like this on boot
For some reason all my instruments have problems decoding this serial output. I've finally managed to get something halfway readable using pulseview on an CSV export from a logic analyzer.
Serial output from the bad module:
[91][91][C9][F5]0x00000000, depc=0x00000000
Fatal exception (0):
epCLO[C2][D1]0201ec8, epc2=0x0L[82][82][82][82][82]b[02]*[C1][8D][CD]=0x00000000, excvaddr=0x00000000, depc=0x00000000
Fatal exc[A5][D1][A9][CB][0B]B[82]):
[A5][8D][B1]O[C2][D1]0[92][8A]*[8D][E1], epcZO[C2][C1][B0][82]0000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
Fatal exception (0):
epc1=0x4020[B1]Y,b[02]*[C1][8D][C9]=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
Fatal exception (0):
epc1=0x40201ec8, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
Fatal exception (0):
epc1=0x40201ec8, epc2=0x00000000, epc3=0x0000[10][96][F5]0x00000000, depc=0x00000000
Fatal exception (0):
epc1=0x40201ec8, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
Unfortunately the exception decoder doesn't seem to do anything with this, even if I just paste the part that looks correctly decoded.
EDIT: 0x40201ec8 is handle_hwdt
Not actually erasing flash seems suspect, and ours as well as esptool issue tracker mentions PUYA chips as source of all kinds of trouble.
The PUYA issues all seem to be related to writing to flash (to filesystem during run time, or with esptool during programming). I always get "Hash of data verified.", on both the good and the bad module. If this is an issue writing the flash during programming, shouldn't the verification fail at this point? I don't see how the heap code changes anything related to flash writing.
For the record, flash_id of chips on my modules:
Good module:
Manufacturer: c8
Device: 4014
Detected flash size: 1MB
Bad module:
Manufacturer: 85
Device: 6014
Detected flash size: 1MB
One idea I had is that maybe the PUYA work-around code itself is a problem, since it might get run on the bad module (due to its flash_id), but not the good one. I see that the workaround malloc's an additional buffer so that might be the connection to the heap code. On the other hand, I don't see why the boot code would want to write to flash early in the boot process.
@avian2
For an experiment to try, add a <sketch name>.ino.globals.h
file to you build with lines:
/*@create-file:build.opt@
-DUMM_INIT_USE_IRAM=1
*/
Also, remove Tools->Debug Level
HWDT
or HWDT_NOEXTRA4K
options from your build.
For confirmation that the new .h
file was processed, you should see something like:
Extracting embedded compiler command-line options from <the real sketch name>.ino.globals.h:1
2, Add command-line option: -DUMM_INIT_USE_IRAM=1
Created compiler command-line options file /tmp/arduino_build_592869/core/build.opt
In the Arduino IDE build console. To see a detailed build report, you may need to turn on "Show verbose output during compilation"
These changes will remove early (pre SDK init) ICACHE read operations.
@mhightower83 Thanks for the suggestion. Adding this define makes the Blink sketch work on the bad modules!
Bad module, unmodified git master (313b3c07ecccbe6fee24aa9fa447c4aed16ca499), with -DUMM_INIT_USE_IRAM=1
: works.
Bad module, unmodified git master (313b3c07ecccbe6fee24aa9fa447c4aed16ca499), without -DUMM_INIT_USE_IRAM=1
: does not work.
I tried alternating the setting three times to make sure it's not random.
@mcspr Some additional info regarding flash writing:
I manually dumped the flash content from the modules with esptool.py read_flash
. Dumps from the good modules, bad modules and the .bin file produced by the compiler are identical.
Also, the unused portion of the flash looks correctly erased (all 0xff
) on the bad modules.
As far as I can see, this is not a problem with programming the flash with the esptool.
Hm. But what's the difference between flash cache being enabled later down the line vs. manual call in that wrapper?
@mcspr
My guess is the initial Boot ROM setup for PUYA is not compatible with the requirements for the ICACHE/SPI hardware-Flash interface to work. The exception 0 with handle_hwdt
hints toward reading zero from the ICACHE. I haven't looked at the RTOS SDK in a while; however, I think it has unique setups for some list of flash chips and a default setup for those not listed. Maybe the NONOS SDK does something similar.
In case it helps in understanding this issue, here's another partly decoded serial capture I have. This was triggered from the RST line, so it shows serial output immediately after reset:
)[D7][81][AA]X[0B][02][C2][02]J[B0]L[16]I[ED]t [A3]X][AE]JJ, boot mode:(3,7)
[AC]+[91][81]0x4010f000, len 3424, roo[AD][8A][B2][02]j[0A]tail 0
[A3]kk[D5][B5] 0x2e
loa[A4][82][C2]Mf[A6][96][82][12][E1], len 40, room 8
tail 0
chksum 0x2b
csum 0x2b
v00041170
~ld
Fatal exception (0):
epc1=0x40223968, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
Fatal exception (0):
epc1=0x40223968, epc2=0x00000000, [82][82]b[02]*[E1][8D][D9][85][91][91][C9][F5]0x00000000, depc=0x00000000
Fatal exception (0):
epc1=0x40223968, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000
Fatal exception (0):
epc1=0x40223968, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00p[82]000, DY[8D][F5]0x00000000
0x40223968 is the address of the umm_init
function in the binary:
$ xtensa-lx106-elf-objdump -d Blink.ino.elf |grep -C3 40223968
40223966: 22 .byte 0x22
40223967: 40 .byte 0x40
40223968 <umm_init>:
40223968: f0c112 addi a1, a1, -16
4022396b: ffd521 l32r a2, 402238c0 <wpa_sm_disassociate+0x4>
4022396e: 21c9 s32i.n a12, a1, 8
40223970: 3109 s32i.n a0, a1, 12
Sorry, I looked around and I don't have any devices with the PUYA flash chip. If you don't mind here is another experiment to try. Add this as a new .cpp
file to your sketch folder. It will change the setting for flashchip->chip_size
to 1M before the umm_init()
call. The thinking is this will prevent any possible out-of-range request from being made to the Flash device.
#include <Arduino.h>
#include <umm_malloc/umm_malloc.h>
extern "C" {
#include "user_interface.h"
#include "cont.h"
}
extern "C" {
/////////////////////////////////////////////////////////////////////////////////
// starter code copied from core_esp8266_main.cpp
extern "C" void call_user_start();
IRAM_ATTR void app_entry_redefinable(void)
{
/* Allocate continuation context on this SYS stack,
and save pointer to it. */
cont_t s_cont __attribute__((aligned(16)));
g_pcont = &s_cont;
flashchip->chip_size = 1024 * 1024; // Fixup chip_size for PUYA
/* Doing umm_init just once before starting the SDK, allowed us to remove
test and init calls at each malloc API entry point, saving IRAM. */
#ifdef UMM_INIT_USE_IRAM
umm_init();
#else
// umm_init() is in IROM
mmu_wrap_irom_fn(umm_init);
#endif
/* Call the entry point of the SDK code. */
call_user_start();
}
};
@mhightower83 I tried the chip_size fixup and there is no difference. I verified that the code gets compiled in by dumping the elf, but it still crashes on boot.
@avian2 Thanks for trying that.
I have been looking around a bit and see this equivalent used before starting Cache_Read_Enable
:
SPI0U |= SPIUCSSETUP; // SPI_CS_SETUP
. It looks promising. In the RTOS SDK and other places, SPI_CS_SETUP was camouflaged as BIT5. It took me a while to recongnize it as relavent. (At least for the example that looked relevant. Some other SPI bus setups had SPI_CS_SETUP.)
If this should fail, I may need to wait till I have hardware to work with to resolve it. I would also look at a roll-back PR.
-DUMM_INIT_USE_IRAM=1
is a good workaround for now. However, for now Debug HWDT will not work with PUYA. It requires early Cache_Read_Enable
to work.
#include <Arduino.h>
#include <umm_malloc/umm_malloc.h>
extern "C" {
#include <user_interface.h>
#include <esp8266_peri.h> // **** new line ****
#include <cont.h>
}
extern "C" {
/////////////////////////////////////////////////////////////////////////////////
// starter code copied from core_esp8266_main.cpp
extern "C" void call_user_start();
IRAM_ATTR void app_entry_redefinable(void)
{
/* Allocate continuation context on this SYS stack,
and save pointer to it. */
cont_t s_cont __attribute__((aligned(16)));
g_pcont = &s_cont;
SPI0U |= SPIUCSSETUP; // SPI_CS_SETUP - **** New line ****
// flashchip->chip_size = 1024 * 1024; // Fixup chip_size for PUYA
/* Doing umm_init just once before starting the SDK, allowed us to remove
test and init calls at each malloc API entry point, saving IRAM. */
#ifdef UMM_INIT_USE_IRAM
umm_init();
#else
// umm_init() is in IROM
// I am not sure what this does. It appears to be the key function called
// from `fix_cache_bug` in the NONOS SDK - Will this help PUYA Flash work?
// It appears to do some, low-level undocumented register manipulation and
// device specific init based on OTP CHIPID model bits.
extern uint32_t phy_get_bb_evm(void); // undocumented
phy_get_bb_evm();
mmu_wrap_irom_fn(umm_init);
#endif
/* Call the entry point of the SDK code. */
call_user_start();
}
};
Edit: I think the flashchip->chip_size = 1024 * 1024;
can be removed.
Edit: 08/30/22, Added call to phy_get_bb_evm();
@mhightower83 Thanks for looking into this.
I've tried the SPI_CS_SETUP fix and unfortunately Blink still does not boot with PUYA flash. Same as before I get an invalid instruction exception in umm_init()
.
I've also tried ESP-01 modules with BergMicro and Kynix flash chips. Those work fine.
@avian2 I will not be able to test this on a device with PUYA flash until after October. The hardware devices I received so far, that were supposed to have PUYA flash, have Tenx. :frowning_face: I updated the previous sample code comment with my latest observations.
@avian2 I will not be able to test this on a device with PUYA flash until after October. The hardware devices I received so far, that were supposed to have PUYA flash, have Tenx. ☹️ I updated the previous sample code comment with my latest observations.
Pfft, you can't rely on getting garbage chips when you purposely order garbage... The irony ;)
@mhightower83 Your latest code from 08/30/22 with phy_get_bb_evm();
fixes the issue with PUYA flash! The Blink sketch as well as some others I tried boot fine on my previously problematic PUYA flash modules.
I also tried it with modules with BergMicro and Kynix flash that I have at hand and they work fine.
Thank you again for taking the time to investigate this problem!
@avian2 It looks like there is still a missing piece to recreating this problem. I have now swapped in the PUYA P25Q80H flash chip on an ESP-01S board and the failure does not occur with the blink sketch with core version ((313b3c07...)) or the current master. So I am failing to recreate the original problem.
I think I have gone as far as I can with this. It is working for you. The logic came from a function named fix_cache_bug
which sounds like something worth incorporating into early icache-enabling logic.
If you have time to try the fix in PR form https://github.com/esp8266/Arduino/pull/8658#issue-1345246635, that would be great.
Otherwise @d-a-v, I think I am done with this PR.
Edited for clairity.
Sorry, I keep meaning to follow up on this but other things kept interfering.
@mhightower83 I'm not sure what's special about PUYA P25Q80H on my modules. Perhaps I have some other die revision? Or maybe the ESP8266 itself is different? Unfortunately I only have one of them left for testing. Others are running some sensors for me (without problems so far - I used then-current esp8266 core with the heap init code commit reverted).
In any case, the current git master (80bf71662551fb5e6579713dc15e541ac48ccf98) works for me on the PUYA module I still have, so I can confirm that your last PR fixed it.
I also checked again just to be sure it's not something else: the issue is indeed still perfectly reproducible with the old core version I initially reported (313b3c07ecccbe6fee24aa9fa447c4aed16ca499).
Thanks again for your help!