Try to trace where watchdog reboots on bl602
Traces all the main loop to serial console to try to catch why bl602 reboots
Thank you, I am running it and will give feedback as soon as I catch one.
e.g. example of [TRACE] seen in log
[TRACE] After scheduled driver starts
[TRACE] Before HAL_PrintNetworkInfo
[TRACE] After HAL_PrintNetworkInfo
[TRACE] After watchdog reset
Info:MAIN:Time 760, idle 0/s, free 95336, MQTT 1(2), bWifi 1, second[TRACE] Exiting OnEverySecond
sWithNoPing -1, socks 2/21
-----------------> AABA Request:
A-MSDU: Permitted
Block Ack Policy: Immediate Block Ack
TID: 0
Number of Buffers: 64
-----------------> AABA Response:
A-MSDU: Not Permitted
Block Ack Policy: Immediate Block Ack
TID: 0
Number of Buffers: 8
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp
[TRACE] After reading temp
[TRACE] After MQTT_RunEverySecondUpdate
[TRACE] After CMD_EVENT_CHANGE_NOMQTTTIME
[TRACE] After MQTT_Dedup_Tick
[TRACE] After LED_RunOnEverySecond
[TRACE] After DRV_OnEverySecond
[TRACE] After UART_RunEverySecond
[TRACE] After CFG_Save_IfThereArePendingChanges
[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
Info:MAIN:Time 761, idle 0/s, free 95336, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21
[TRACE] Exiting OnEverySecond
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp
[TRACE] After reading temp
[TRACE] After MQTT_RunEverySecondUpdate
I captured numerous, see below.
Info:MAIN:Time 3928, idle 0/s, free 95336, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp
Starting bl602 now....
Booting BL602 Chip...
██████╗ ██╗ ██████╗ ██████╗ ██████╗
██╔══██╗██║ ██╔════╝ ██╔═████╗╚════██╗
██████╔╝██║ ███████╗ ██║██╔██║ █████╔╝
██╔══██╗██║ ██╔═══██╗████╔╝██║██╔═══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝ ╚═════╝ ╚══════╝
------------------------------------------------------------
RISC-V Core Feature:RV32-ACFIMX
Build Version: release_bl_iot_sdk_1.6.22-22-g1d4ff804-dirty
Std Driver Version: 541807d
PHY Version: a0_final-73-g62481a0
RF Version: 79cc6b9
Build Date: Aug 19 2024
Build Time: 19:03:24
Boot Reason: BL_RST_SOFTWARE_WATCHDOG
Info:MAIN:Time 5659, idle 0/s, free 95336, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21
[TRACE] Exiting OnEverySecond
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp
bl602 now....
Booting BL602 Chip...
██████╗ ██╗ ██████╗ ██████╗ ██████╗
██╔══██╗██║ ██╔════╝ ██╔═████╗╚════██╗
██████╔╝██║ ███████╗ ██║██╔██║ █████╔╝
██╔══██╗██║ ██╔═══██╗████╔╝██║██╔═══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝ ╚═════╝ ╚══════╝
------------------------------------------------------------
RISC-V Core Feature:RV32-ACFIMX
Build Version: release_bl_iot_sdk_1.6.22-22-g1d4ff804-dirty
Std Driver Version: 541807d
PHY Version: a0_final-73-g62481a0
RF Version: 79cc6b9
Build Date: Aug 19 2024
Build Time: 19:03:24
Boot Reason: BL_RST_SOFTWARE_WATCHDOG
[TRACE] After DRV_OnEverySecond
[TRACE] After UART_RunEverySecond
[TRACE] After CFG_Save_IfThereArePendingChanges
[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
[TRACE] Exiting OnEverySecond
Info:MAIN:Time 834, idle 0/s, free 95336, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp
Starting bl602 now....
Booting BL602 Chip...
██████╗ ██╗ ██████╗ ██████╗ ██████╗
██╔══██╗██║ ██╔════╝ ██╔═████╗╚════██╗
██████╔╝██║ ███████╗ ██║██╔██║ █████╔╝
██╔══██╗██║ ██╔═══██╗████╔╝██║██╔═══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝ ╚═════╝ ╚══════╝
------------------------------------------------------------
RISC-V Core Feature:RV32-ACFIMX
Build Version: release_bl_iot_sdk_1.6.22-22-g1d4ff804-dirty
Std Driver Version: 541807d
PHY Version: a0_final-73-g62481a0
RF Version: 79cc6b9
Build Date: Aug 19 2024
Build Time: 19:03:24
Boot Reason: BL_RST_SOFTWARE_WATCHDOG
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp
[TRACE] After reading temp
[TRACE] After MQTT_RunEverySecondUpdate
[TRACE] After CMD_EVENT_CHANGE_NOMQTTTIME
[TRACE] After MQTT_Dedup_Tick
[TRACE] After LED_RunOnEverySecond
[TRACE] After DRV_OnEverySecond
[TRACE] After UART_RunEverySecond
[TRACE] After CFG_Save_IfThereArePendingChanges
[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
Info:MQTT:Publishing val FFFFFF to obl94F28B35/led_basecolor_rgb/get retain=0
Info:MAIN:Time 445, idle 0/s, free 95336, MQTT 1[TRACE] Exiting OnEverySecond
(2), bWifi 1, secondsWithNoPing -1, socks 2/21
Info:MQTT:MQTT client in mqtt_incoming_publish_cb topic obl94F28B35/led_basecolor_rgb/get
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp
[TRACE] After reading temp
[TRACE] After MQTT_RunEverySecondUpdate
[TRACE] After CMD_EVENT_CHANGE_NOMQTTTIME
[TRACE] After MQTT_Dedup_Tick
[TRACE] After LED_RunOnEverySecond
[TRACE] After DRV_OnEverySecond
[TRACE] After UART_RunEverySecond
[TRACE] After CFG_Save_IfThereArePendingChanges
[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
Info:MQTT:Publishing val 100 to obl94F28B35/led_dimmer/get retain=0
Info:MA[TRACE] Exiting OnEverySecond
IN:Time 446, idle 0/s, free 95336, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21
Info:MQTT:MQTT client in mqtt_incoming_publish_cb topic obl94F28B35/led_dimmer/get
[TRACE] Entering OnEverySecond
[TRACE] Before reading temp
Starting bl602 now....
Booting BL602 Chip...
██████╗ ██╗ ██████╗ ██████╗ ██████╗
██╔══██╗██║ ██╔════╝ ██╔═████╗╚════██╗
██████╔╝██║ ███████╗ ██║██╔██║ █████╔╝
██╔══██╗██║ ██╔═══██╗████╔╝██║██╔═══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝ ╚═════╝ ╚══════╝
------------------------------------------------------------
RISC-V Core Feature:RV32-ACFIMX
Build Version: release_bl_iot_sdk_1.6.22-22-g1d4ff804-dirty
Std Driver Version: 541807d
PHY Version: a0_final-73-g62481a0
RF Version: 79cc6b9
Build Date: Aug 19 2024
Build Time: 19:03:24
Boot Reason: BL_RST_SOFTWARE_WATCHDOG
Okay, so reading temperature causes this. There is apparently a possibility to wait indefinitely during temp read here https://github.com/openshwprojects/OpenBL602/blob/e5769160cfc91a5fe36f040b3d5314e51eda3a28/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c#L1199 . That causes the issue. I guess we should not read temperature of bl602 by default.
Couldn't we just limit the number of tries for the loop and skip else?
Like (not tested, just wrote down and the number should be adjusted)
ADC_Start();
int t=100;
while (ADC_Get_FIFO_Count() == 0 && t-- > 0)
;
if ( t <= 0 ) return -999;
regVal = ADC_Read_FIFO();
You probably saw it: this "while" loop fragment is called twice, for low and high...
Could be done, but that requires modifying sdk. For now I disabled temp reading, lets see if it helps.
Could be done, but that requires modifying sdk. For now I disabled temp reading, lets see if it helps.
Installed the build, can see 0.0 temp, so def. not reading the temp anymore, will report back on stability and any reboots.
12 hours and counting, and thus far no reboot problems.
Observation: The device feels much more responsive, the temp. reading code in the SDK is prob. CPU intensive and it is called frequently.
Proposal: Copy the SDK code to your code and don't change the SDK, thus creating a similar, more efficient function without the endless loop. And secondly, don't update the temp. that frequently, say once every 10 sec should be efficient.
There is another implementation of a "TSEN_Get_Temp" (with different arguments) here:
https://github.com/bouffalolab/bouffalo_sdk/blob/master/drivers/lhal/src/bflb_adc.c#L744
Its waiting up to 100 ms for temperature data:
bflb_adc_start_conversion(dev);
start_time = bflb_mtimer_get_time_ms();
while (bflb_adc_get_count(dev) == 0) {
if ((bflb_mtimer_get_time_ms() - start_time) > 100) {
return -ETIMEDOUT;
}
}
Note: e.g. inside "void bflb_update_adc_trim(struct bflb_device_s *dev, const struct bflb_adc_config_s *config" there is also an unlimited loop
I tried to make a version with the changes below: dev_20240821_115931.zip
(since I don't have a BL602, I can't test and there is a slight danger, this version won't work at all)
In SDK
diff --git a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
index 23bd5145..60ab59a6 100644
--- a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
+++ b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c
@@ -1187,6 +1187,7 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
ADC_Result_Type result;
uint32_t tmpVal;
uint8_t gainCalEnabled=0;
+ uint64_t start_time;
/* clear fifo by SET GPIP_GPADC_FIFO_CLR bit*/
tmpVal = BL_RD_REG(GPIP_BASE, GPIP_GPADC_CONFIG);
@@ -1196,8 +1197,13 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
ADC_SET_TSVBE_LOW();
ADC_Start();
- while (ADC_Get_FIFO_Count() == 0)
- ;
+ // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+ start_time = bflb_platform_get_time_ms();
+ while (ADC_Get_FIFO_Count() == 0){
+ if ((bflb_platform_get_time_ms() - start_time) > 100) {
+ return -999;
+ }
+ }
regVal = ADC_Read_FIFO();
gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
@@ -1215,8 +1221,13 @@ float TSEN_Get_Temp(uint32_t tsen_offset)
ADC_SET_TSVBE_HIGH();
ADC_Start();
- while (ADC_Get_FIFO_Count() == 0)
- ;
+ // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767
+ start_time = bflb_platform_get_time_ms();
+ while (ADC_Get_FIFO_Count() == 0){
+ if ((bflb_platform_get_time_ms() - start_time) > 100) {
+ return -999;
+ }
+ }
regVal = ADC_Read_FIFO();
gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable;
adcGainCoeffCal.adcGainCoeffEnable=0;
re-enabled temperature reads every five seconds
diff --git a/src/user_main.c b/src/user_main.c
index fdbff487..2a39e65c 100644
--- a/src/user_main.c
+++ b/src/user_main.c
@@ -502,8 +502,8 @@ void Main_OnEverySecond()
g_wifi_temperature = temperature * 0.128f;
#endif
#elif PLATFORM_BL602
- //do not read temp, sometimes it's to slow and causes reboot
- //get_tsen_adc(&g_wifi_temperature, 0);
+ // get temp every 5 seconds
+ if (g_secondsElapsed % 5 ==0) get_tsen_adc(&g_wifi_temperature, 0);
#elif PLATFORM_LN882H
// this is set externally, I am just leaving comment here
#endif
I tried to make a version with the changes below: dev_20240821_115931.zip
(since I don't have a BL602, I can't test and there is a slight danger, this version won't work at all)
In SDK
diff --git a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c index 23bd5145..60ab59a6 100644 --- a/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c +++ b/components/bl602/bl602_std/bl602_std/StdDriver/Src/bl602_adc.c @@ -1187,6 +1187,7 @@ float TSEN_Get_Temp(uint32_t tsen_offset) ADC_Result_Type result; uint32_t tmpVal; uint8_t gainCalEnabled=0; + uint64_t start_time; /* clear fifo by SET GPIP_GPADC_FIFO_CLR bit*/ tmpVal = BL_RD_REG(GPIP_BASE, GPIP_GPADC_CONFIG); @@ -1196,8 +1197,13 @@ float TSEN_Get_Temp(uint32_t tsen_offset) ADC_SET_TSVBE_LOW(); ADC_Start(); - while (ADC_Get_FIFO_Count() == 0) - ; + // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767 + start_time = bflb_platform_get_time_ms(); + while (ADC_Get_FIFO_Count() == 0){ + if ((bflb_platform_get_time_ms() - start_time) > 100) { + return -999; + } + } regVal = ADC_Read_FIFO(); gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable; @@ -1215,8 +1221,13 @@ float TSEN_Get_Temp(uint32_t tsen_offset) ADC_SET_TSVBE_HIGH(); ADC_Start(); - while (ADC_Get_FIFO_Count() == 0) - ; + // let's only try for 100 ms like here: https://github.com/bouffalolab/bouffalo_sdk/blob/9a267ff0f9c40fce3efcd4c92a726349381a9b31/drivers/lhal/src/bflb_adc.c#L767 + start_time = bflb_platform_get_time_ms(); + while (ADC_Get_FIFO_Count() == 0){ + if ((bflb_platform_get_time_ms() - start_time) > 100) { + return -999; + } + } regVal = ADC_Read_FIFO(); gainCalEnabled=adcGainCoeffCal.adcGainCoeffEnable; adcGainCoeffCal.adcGainCoeffEnable=0;re-enabled temperature reads every five seconds
diff --git a/src/user_main.c b/src/user_main.c index fdbff487..2a39e65c 100644 --- a/src/user_main.c +++ b/src/user_main.c @@ -502,8 +502,8 @@ void Main_OnEverySecond() g_wifi_temperature = temperature * 0.128f; #endif #elif PLATFORM_BL602 - //do not read temp, sometimes it's to slow and causes reboot - //get_tsen_adc(&g_wifi_temperature, 0); + // get temp every 5 seconds + if (g_secondsElapsed % 5 ==0) get_tsen_adc(&g_wifi_temperature, 0); #elif PLATFORM_LN882H // this is set externally, I am just leaving comment here #endif
@MaxineMuster does this build also contain the trace statements by @giedriuslt just in case there are reboots to at least see where? I am considering loading it onto my 'bench' BL602 which has serial logging and wires soldered to flash incase it fails, @giedriuslt what do you think? I have also installed @giedriuslt trace version on a live BL602 bulb currently at 18 hours uptime.
PS. Anyone else tracing and testing?
@MaxineMuster does this build also contain the trace statements by @giedriuslt just in case there are reboots to at least see where?
Yes, my test-version is based on this PR, so including all the "[TRACE]..." messages
[EDIT]:
@giedriuslt if you replace printf("[TRACE]... with bk_printf("[TRACE]... it should also compile for BEKEN platform (your work might be useful in the future there, too).
Though I don't see at a first glance, why this leads to undefined reference to `addLog'...
@MaxineMuster does this build also contain the trace statements by @giedriuslt just in case there are reboots to at least see where?
Yes, my test-version is based on this PR, so including all the "[TRACE]..." messages
[EDIT]: @giedriuslt if you replace
printf("[TRACE]...withbk_printf("[TRACE]...it should also compile for BEKEN platform (your work might be useful in the future there, too). Though I don't see at a first glance, why this leads toundefined reference to `addLog'...
Ok, installed @MaxineMuster version on my 'bench' BL602. Let's see how it goes. @giedriuslt Version still running on live bulb with 21 hours uptime.
EDIT: 24 hours and counting on @giedriuslt build without temperature measurement.
I can test dev firmware, but I can't get log by serial from device. If it will be usefull, I can tell if will get reboot.
BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting LIVE BL602 without serial logging using @giedriuslt version: 36 hours no BL_RST and counting
BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting
I'm curious: Is temperature reading working? Are there some -999° readings (in this simple test they are not disgarded as invalid)?
I'm curious: Is temperature reading working?
in web gui reading works. and it looks like reading correct value.
BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting
I'm curious: Is temperature reading working? Are there some -999° readings (in this simple test they are not disgarded as invalid)?
Have not yet seen a -999, otherwise it's working fine.
BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting LIVE BL602 without serial logging using @giedriuslt version: 36 hours no BL_RST and counting
BENCH BL602 with serial logging using @MaxineMuster version: 34 hours no BL_RST and counting LIVE BL602 without serial logging using @giedriuslt version: 58 hours no BL_RST and counting
BENCH BL602 with serial logging using @MaxineMuster version: 12 hours no BL_RST and counting LIVE BL602 without serial logging using @giedriuslt version: 36 hours no BL_RST and counting
BENCH BL602 with serial logging using @MaxineMuster version: 34 hours no BL_RST and counting LIVE BL602 without serial logging using @giedriuslt version: 58 hours no BL_RST and counting
BENCH BL602 with serial logging using @MaxineMuster version: 2.5 days no BL_RST and counting LIVE BL602 without serial logging using @giedriuslt version: 3.5 days no BL_RST and counting
Seems like this fix the BL602 stability, at least for my devices (Light bulbs).
on version 655, I got a reboot on day 5. So it's too early to draw conclusions.
on version 655
Sorry to ask, but, did you try the "release" version or the firmware offered in this PR?
Sorry to ask, but, did you try the "release" version or the firmware offered in this PR?
yep. I mean release version.
O.k., the fix tested here is not yet in release version, but only in the versions offered here. So your watchdog reset could still be caused by the infinite loop in temperature reading ...
So your watchdog reset could still be caused by the infinite loop in temperature reading
for now I'm running your version posted here.
I mean on 655 release version first watchdog reboot took a lot of time. about 6 days.
Never had this issue with temperature readings on my devices, but recently got reboot after 5 days, unfortunately logs do not help.
[TRACE] Before scheduled driver starts
[TRACE] After scheduled driver starts
[TRACE] After watchdog reset
[TRACE] Exiting OnEverySecond
Info:MAIN:Time 393588, idle 0/s, free 91848, MQTT 1(10), bWifi 1, secondsWithNoPing 393517, socks 4/21
Starting bl602 now....
Booting BL602 Chip...
Never had this issue with temperature readings on my devices, but recently got reboot after 5 days, unfortunately logs do not help.
temp. reading works on release versions, but as I understood it may cause watchdog reboot.
BENCH BL602 with serial logging using @MaxineMuster version: 4 days no BL_RST and counting LIVE BL602 without serial logging using @giedriuslt version: 5 days no BL_RST and counting
It's the most stable these light bulbs have been on OpenBeken firmware. But let's see how far I can push my luck, def. an improvement so far.
Never had this issue with temperature readings on my devices, but recently got reboot after 5 days, unfortunately logs do not help.
[TRACE] Before scheduled driver starts [TRACE] After scheduled driver starts [TRACE] After watchdog reset [TRACE] Exiting OnEverySecond Info:MAIN:Time 393588, idle 0/s, free 91848, MQTT 1(10), bWifi 1, secondsWithNoPing 393517, socks 4/21 Starting bl602 now.... Booting BL602 Chip...
What is the reason for the reboot? Watchdog? I'm curious if there's some other adc call been made that uses one of those funky endless loops.
You might be on to something here. WiFi driver is potentially reading temp for calibration purposes the same way. https://github.com/openshwprojects/OpenBL602/blob/e5769160cfc91a5fe36f040b3d5314e51eda3a28/components/bl602/bl602_wifidrv/bl60x_wifi_driver/wifi_mgmr.c#L997 That could cause the same issue. Although I never got reboots from normal temp reading.... I'll try to log this
Well, no, this temperature calibration is disabled by default. So something else caused my reboot