arduino-esp32
arduino-esp32 copied to clipboard
Non-void function with no values returned causes crash
Make your question, not a Statement, inclusive. Include all pertinent information:
What you are trying to do? While writing firmware, I stumbled in puzzling crashes. This firmware used to work fine with arduino-esp32 2.0.0. I reverted back to 1.0.6 and it worked. No compiler errors were thrown with both. The crashes sometimes produced a Guru Meditation Error, sometimes it was stuck and the watchdog would trigger, sometimes it simply rebooted. By reverting to the previous commit, and adding bit of new code at a time, I pinpointed the culprit.
An initialization function declared with a bool, had no returned values. Latest print on Serial was "pin set" and then boom.
Looking closely, the firmware throws a warning: [...] In function 'bool mcp23008Init()': [...]:2570:1: warning: no return statement in function returning non-void [-Wreturn-type] The line corresponds to the end of the mcp23008Init function.
setup () {
[...]
if(i2cScan(0,0x20)) { //look for MCP23008 (bus 0, addr 20)
mcp23008status = true;
out.println(F("found"));
mcp23008Init();
mcp.digitalWrite(ioCameraPin, LOW);
mcp.digitalWrite(ioHotspotPin, LOW);
mcp.digitalWrite(ioBuzzerPin, LOW);
mcp.digitalWrite(ioLoraPin, LOW);
mcp.digitalWrite(ioSpotlightPin, LOW);
mcp.digitalWrite(ioInverterPin, LOW);
mcp.digitalWrite(ioLeftLedPin, LOW);
mcp.digitalWrite(ioRightLedPin, LOW);
out.println("Pin low");
} else {
out.println("MCP23008 not found!");
blinkDelay = 1000;
errorCode = setErrorBit(errorCode,3,1); //3rd bit is mcp23008 error
}
[...]
}
bool mcp23008Init() {
if(!mcp.begin(0x20,&Wire)) {
errorCode = setErrorBit(errorCode,3,1); //3nd bit is MCP23008 error
out.println(F("MCP23008 init failed!"));
} else {
out.println("Initialized");
errorCode = setErrorBit(errorCode,3,0); //clear error
mcp.pinMode(ioCameraPin, OUTPUT);
mcp.pinMode(ioHotspotPin, OUTPUT);
mcp.pinMode(ioBuzzerPin, OUTPUT);
mcp.pinMode(ioLoraPin, OUTPUT);
mcp.pinMode(ioSpotlightPin, OUTPUT);
mcp.pinMode(ioInverterPin, OUTPUT);
mcp.pinMode(ioLeftLedPin, OUTPUT);
mcp.pinMode(ioRightLedPin, OUTPUT);
out.println("Pin set");
}
}
Hardware:
Board: ESP32 Dev Module Core Installation version: 2.0.0 (crash), 1.0.6 (working) IDE name: Platform.io Flash Frequency: 40Mhz PSRAM enabled: no Upload Speed: 115200 Computer OS: Windows 10
According to some StackOverflow posts, declaring a non-void function without returning a result is undefined behavior. Probably some compiler settings changed between 1.0.6 and 2.0.0 https://stackoverflow.com/questions/57163022/c-crash-on-a-non-void-function-doesnt-have-return-statement
When changed the function to void, the code works with 2.0.0. as usual.
It is not a arduino-esp32 bug. I would expect a compiler error, not a warning. However, if this is not possible, the noobs like me should be aware!
Hello, we did some changes from v2.0.0. Are you able to validate this with version 2.0.3-RC1? Thanks!
Hi, been using 2.0.1 so far with the proposed workaround (void function instead of non-void function). I can give it a try with 2.0.3-RC1 and let you know
@garageeks any updates?
I've come across this exact same issue with the released 2.0.2. A project that compiled and ran just fine on 1.0.6...now bootloops the ESP32 on 2.0.2. Interestingly enough, the backtrace is reported corrupted every single time; I've never seen that before.
(Worth noting that the normal Arduino IDE compiler warnings are set to "None"; setting them to "Default" showed a lot of warnings.)
After several hours of debugging, I traced the issue down to a function declared "int func([args])", but which didn't have a RETURN on every single possible exit path. The corresponding warning provided was:
warning: control reaches end of non-void function [-Wreturn-type]
OK, whatever...updated compiler, new rules, it wasn't an error, etc. My code wasn't even looking at the return value in the first place--so it wasn't a code problem.
Only this is what was causing the ESP32 crash. Apparently, if the function doesn't RETURN a value, the compiler for whatever unknown reason loops the function back on itself--to some random part of the function, anyway.
Debug PRINT statements show the following behavior in the function:
FuncStart, val: 0
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
Func Exit, no ret
_[ad infinitum]_
(where "Func Start" is at the top of the function, and "Func Exit" is at the end, right before the commented-out RETURN statement.) Yes, yes, I know, "what's the function do??" While that's not pertinent here, the function runs a Steinhart-Hart calculation and then prints it to an LCD (if requested, otherwise it returns the converted result without printing to the LCD).
Worth noting that the ESP32 doesn't crash when the Serial.Print statements are in place (due to the task scheduler getting a break). If I remove said statements (and leave the RETURN commented out), I get a crash dump like this:
E (17744) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (17744) task_wdt: - IDLE (CPU 0)
E (17744) task_wdt: Tasks currently running:
E (17744) task_wdt: CPU 0: GUI
E (17744) task_wdt: CPU 1: IDLE
E (17744) task_wdt: Aborting.
abort() was called at PC 0x401081dc on core 0
Backtrace:0x400837a1:0x3ffbedbc |<-CORRUPTED
ELF file SHA256: 0000000000000000
Rebooting...
(worth noting, I did not edit the above dump at all; the "|<-CORRUPTED" is what the ESP32 sent)
It should be relatively trivial to make a very simple "reproduce case" for this. But if the issue is the compiler having a bad day over a non-void function without a Return statement, then I would suggest that the compiler WARNING should be changed to an ERROR.
hello, thanks for update. We will investigate this.
I had to get the "latest" (2.0.3 ++ whatever!) Arduino-ESP32 firmware yesterday due to another bug that was fixed ~4 days ago (#6659 to be exact!), so if necessary, I can test this on the current "master" branch.
@WebDust21 I don't understand all of those exclamation mark you have used... but we will soon release 2.0.3-stable so it can be retested with it.
@WebDust21
OK, whatever...updated compiler, new rules, it wasn't an error, etc. My code wasn't even looking at the return value in the first place--so it wasn't a code problem
In C++, Omitting the return statement in a function which has a return type that is not void is undefined behavior.
It is a coding problem.
@hreintke I'm not saying that the "new behavior" in Arduino Core 2.x.x is incorrect by any means. Just that if this new behavior is going to be the de-facto method going forwards...that the compiler warning should be changed to a compiler error.
Especially considering that the default compiler alerts setting in Arduino IDE is "none."
My point about saying "it wasn't a coding problem" was that my code was not faulting as a result of the undefined function response. The fault is a result of the compiler's handling of said function.
I don't want to get in a semantical discussion so this will be my last remark on this.
my code was not faulting as a result of the undefined function response. The fault is a result of the compiler's handling of said function
As not having a return statement executed is undefined behavior, any handling of the compiler of this is OK. It's up to the programmer to detect and handle this correctly.
I think nobody disagrees with the fact it is a programming fault. But better to 'crash' it during compile time instead of runtime. So I'm in favor of having this warning be set as an error.
I just ran into this problem. I am not so familiar with C++ but in C a missing return generates a warning but is benign - I believe its because there is always a place for a return value on the stack. The behaviour of this had all the hallmarks of a stack corruption. Like the caller popped a value off the stack that wasnt pushed. Given that behaviour it should generate an error. Some of the other type checking C++ indulges in seems a bit pointless if there can be a latent stack problem that could be lurking - Martin
I can confirm this for ESP32 version 2.0.6, spent the whole afternoon hunting it down, given that core version 1.0.6 resulted in a perfectly running firmware for my device. And the exact same source code gave an obscure core 1 panic illegalInstruction after the elusive missing return. Tried with both with and without inline in front of the function declaration. Including a return made the problem go away. (Note: my snippet gives a slightly different console message)
The sketch
int theCulprit(void){
Serial.println("After this we crash");
}
void setup() {
// put your setup code here, to run once:
Serial.begin(115200);
Serial.println("hello 1");
theCulprit();
Serial.println("somehow we survived");
}
void loop() {
// put your main code here, to run repeatedly:
}
Console
hello 1
After this we crash
Guru Meditation Error: Core 1 panic'ed (Unhandled debug exception).
Debug exception reason: Stack canary watchpoint triggered (loopTask)
Core 1 register dump:
PC : 0x4008aafb PS : 0x00060a36 A0 : 0x8008c824 A1 : 0x3ffb08c0
A2 : 0x3ffb0888 A3 : 0xb33fffff A4 : 0x0000cdcd A5 : 0x00060a23
A6 : 0x00060a23 A7 : 0x0000abab A8 : 0x0000abab A9 : 0xffffffff
A10 : 0x00000022 A11 : 0x00000000 A12 : 0x400d26e4 A13 : 0x3ffb8a40
A14 : 0x007b0888 A15 : 0x003fffff SAR : 0x0000001b EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000 LBEG : 0x40085e2c LEND : 0x40085e37 LCOUNT : 0x00000000
Backtrace: 0x4008aaf8:0x3ffb08c0 0x4008c821:0x3ffb0900 0x4008c84c:0x3ffb0920 0x4008cac8:0x3ffb0940 0x400836be:0x3ffb0960 0x400836d1:0x3ffb0990 0x400dbd35:0x3ffb09b0 0x400dc083:0x3ffb0a10 0x400d365d:0x3ffb0a40 0x400d43ca:0x3ffb0a80 0x400d2104:0x3ffb0ad0 0x400d1624:0x3ffb0b30 0x400d12b0:0x3ffb0b80 0x400d128d:0x3ffb0bb0 0x400d12bb:0x3ffb0bd0 0x400d128d:0x3ffb0c00 0x400d12bb:0x3ffb0c20 0x400d128d:0x3ffb0c50 0x400d12bb:0x3ffb0c70 0x400d128d:0x3ffb0ca0 0x400d12bb:0x3ffb0cc0 0x400d128d:0x3ffb0cf0 0x400d12bb:0x3ffb0d10 0x400d128d:0x3ffb0d40 0x400d12bb:0x3ffb0d60 0x400d128d:0x3ffb0d90 0x400d12bb:0x3ffb0db0 0x400d128d:0x3ffb0de0 0x400d12bb:0x3ffb0e00 0x400d128d:0x3ffb0e30 0x400d12bb:0x3ffb0e50 0x400d128d:0x3ffb0e80 0x400d12bb:0x3ffb0ea0 0x400d128d:0x3ffb0ed0 0x400d12bb:0x3ffb0ef0 0x400d128d:0x3ffb0f20 0x400d12bb:0x3ffb0f40 0x400d128d:0x3ffb0f70 0x400d12bb:0x3ffb0f90 0x400d128d:0x3ffb0fc0 0x400d12bb:0x3ffb0fe0 0x400d128d:0x3ffb1010 0x400d12bb:0x3ffb1030 0x400d128d:0x3ffb1060 0x400d12bb:0x3ffb1080 0x400d128d:0x3ffb10b0 0x400d12bb:0x3ffb10d0 0x400d128d:0x3ffb1100 0x400d12bb:0x3ffb1120 0x400d128d:0x3ffb1150 0x400d12bb:0x3ffb1170 0x400d128d:0x3ffb11a0 0x400d12bb:0x3ffb11c0 0x400d128d:0x3ffb11f0 0x400d12bb:0x3ffb1210 0x400d128d:0x3ffb1240 0x400d12bb:0x3ffb1260 0x400d128d:0x3ffb1290 0x400d12bb:0x3ffb12b0 0x400d128d:0x3ffb12e0 0x400d12bb:0x3ffb1300 0x400d128d:0x3ffb1330 0x400d12bb:0x3ffb1350 0x400d128d:0x3ffb1380 0x400d12bb:0x3ffb13a0 0x400d128d:0x3ffb13d0 0x400d12bb:0x3ffb13f0 0x400d128d:0x3ffb1420 0x400d12bb:0x3ffb1440 0x400d128d:0x3ffb1470 0x400d12bb:0x3ffb1490 0x400d128d:0x3ffb14c0 0x400d12bb:0x3ffb14e0 0x400d128d:0x3ffb1510 0x400d12bb:0x3ffb1530 0x400d128d:0x3ffb1560 0x400d12bb:0x3ffb1580 0x400d128d:0x3ffb15b0 0x400d12bb:0x3ffb15d0 0x400d128d:0x3ffb1600 0x400d12bb:0x3ffb1620 0x400d128d:0x3ffb1650 0x400d12bb:0x3ffb1670 0x400d128d:0x3ffb16a0 0x400d12bb:0x3ffb16c0 0x400d128d:0x3ffb16f0 0x400d12bb:0x3ffb1710 0x400d128d:0x3ffb1740 0x400d12bb:0x3ffb1760 0x400d128d:0x3ffb1790 0x400d12bb:0x3ffb17b0 0x400d128d:0x3ffb17e0 0x400d12bb:0x3ffb1800 0x400d128d:0x3ffb1830 0x400d12bb:0x3ffb1850 0x400d128d:0x3ffb1880 0x400d12bb:0x3ffb18a0 0x400d128d:0x3ffb18d0 0x400d12bb:0x3ffb18f0 0x400d128d:0x3ffb1920 0x400d12bb:0x3ffb1940 |<-CONTINUES
ELF file SHA256: 5753d8d4084c69e4
@VojtechBartoska or someone close to the design team,
I've gone through the thread and came to this conclusion: please consider issuing a compiler error for this missing return thingy, and its consequence, crashing the whole system if a non-void subroutine returns nothing, instead of a muted-in-arduino-by-default warning.
This behavior, crashing, is pretty much the very definition of a breaking change, since the 1.0.6 toolchain compiled and ran my exact same code just fine, for whatever reason, and it wasn't until the holiday break that I had time to hack and go core version 2 on my toys. True, from an engineering perspective, the missing return is point blank ugly, and my bad there, typos happen, ignorance happens, sloppiness happens, overlooking one's own diffs and pushing unintentionally altered code happens, these all happen.
But there is also the PR perspective here, beyond people's wasted time on debugging: it's arduino we are talking about. Not everyone compiling for esp32 happens to work in an r&d, with all the right tools and know how to get the idea where to start looking. Ham fisting this issue into novice developers' mind the hard way, like in good old x86 times pop ds system-bye-bye, is like the monk cutting off a finger of the disciple to show him where satori is not.
If the result is a crash, and the linter/compiler/parser knows it, please don't let it run and crash.
If the result is a crash, and the linter/compiler/parser knows it, please don't let it run and crash.
Yep, the compiler should fail on this. For the very simple reason that it will then fail for the one who also needs to fix it in the code at the perfect time too. Simply because the dev is then still working on the code and thus can find the bug very easily as it is still 'fresh'. Failing at runtime is as bad as it gets for such bugs.
IMHO this is also a bad design for the compiler as it never ever should allow to finish on such code.
This bit me too. Using "-Werror=return-type" as a build argument promotes this warning to an error in the future.
It would be helpful if it was an error by default, but some constructs need it to not be an error, and are allowed by the c++ standard, like some check for an unrecoverable error:
bool check(int value)
{
if (value < 100) return true;
esp_restart();
// no return here
}
(This is a weird example, but oh well...)
I think the default build flags in e.g. platformio should have it on by default, though.
And this issue bit me too..
Not a big deal to fix it, when you stumble upon, by adding a return accordingly. But it worries me a bit that if no error is produced on such code, during compilation, one could deliver code that might run for quite a while until some condition is met where nothing is returned and your device crashes.
bool checktheday(int day) { if (day != DOOMSDAY) return true; }
So, the question is, should we be strict or practical..
But it worries me a bit that if no error is produced on such code, during compilation, one could deliver code that might run for quite a while until some condition is met where nothing is returned and your device crashes.
The worst part about this is that when it DOES crash as a result...the backtrace and all debugging outputs have always been completely useless in my experience. So you can spend days trying to figure out "what's wrong with routines x,y,z" (and they may even be official ESP-IDF routines!) before finally getting a bit bombastic with the debug methods. And then only to finally figure out a missing "return" in a completely separate code file.
worth noting that PlatformIO does not have this as an error either. I was so convinced that the Arduino-ESP32 SPI.WriteBytes function had some critical fault in it (as that's where the crash stack pointed to on an ESP32-S3). Well, after peppering the whole code section with Serial.Print statements, I found a missing "return" on a non-void function in a different file. Fixed that and poof the ESP32-S3 stopped crashing. And while PlatformIO may have "warned" on the error the first time, PlatformIO only recompiles code that was modified--so the "warning" never reappeared in all subsequent compilations as I tried to solve the crashing problem.
C'mon @VojtechBartoska...just make this a compiler ERROR (instead of a WARNING), and close the issue. As far as issues go, this one's slam-dunk easy to resolve.
I really wonder what would be a valid reason not to have this being considered an error and stop compilation? Is there a practical use case to have it like it is now?
I really wonder what would be a valid reason not to have this being considered an error and stop compilation? Is there a practical use case to have it like it is now?
Prior to 2.0.0....yes, having it a warning was OK, as it didn't cause mysterious crashing of the end user application.
But I completely agree that the issue is a result of poor C coding practice. Even prior to 2.0.0, however, there still was no reason for this to be a "warning" in the first place: there is nothing to be gained.
From a machine-language standpoint, a non-void function exiting without a given return value could provide a completely useless/random/unknown/unspecified return value. That is, if the return value is provided in a CPU register, and not put on the stack. It's possible from a completely hypothetical (ARM-core) standpoint that 2.0.0 switched to PUSHing a return value to the stack (instead of just leaving the return value in "R0"). This would mean that a "non-void return function" by definition now would have to POP a return value plus the return address off the stack. Which means that if no "return" value is PUSH'd to the stack (by use of a "return (x)" statement), the function-ending POP will now mess up the stack due to hardcode pulling 2 registers off of the stack (due to expecting a return value to have been PUSH'd to the stack). Depending on how function calls are handled in machine code, this could potentially easily explain why a non-void routine return results in the culprit routine going into a dead loop until the RTOS WDT times out (or the stack overflows): what's expected to be the POP'd "return address" ends up being the address to the called function--so the ESP32 core jumps to said address and reruns the function. Ad infinitum. I'm not at all familiar with the Tensilica Xtensa core of the ESP32s, but that's at least how it could be explained from an ARM7 standpoint.
Still, we programmers are human and thus may benefit from some machine check to help us prevent making such errors. Well, as long as ChatGPT isn't writing our code.
But I still consider myself perfectly capable of making programming errors and would really not mind if a compiler would help me to not speed up turning my hair all grey or nails been bitten off in search for my own stupid errors.
So what perfectly fine reason is there to not enable this compiler flag to save my sanity for a little longer?
Yeah, back in the old day in x86 we had the return value in a register, either ax, or dx:ax for 32 bit values, and the caller took care about pushing popping said register, no "surprises" were left on the stack. This is exactly what must have changed from esp32 ver 1.x to ver 2.x, and I still don't get why the silence. Are we literally like five people on the entire planet running into this issue?
Are we literally like five people on the entire planet running into this issue?
The five of us that found this thread, for sure.
I understand the concept of garbage-in-garbage-out, but it is nasty that older projects that ran fine for years, can now have hidden crashes that may occur totally unexpected when a rare condition is met. If a function exits early with a proper return value, all is well. But there may be a rare exit condition that makes it crash.
My projects are mostly hobby stuff, call it toys if you want. But I reckon many will use it for more serious stuff, and a lurking crash which only presents when a rare condition is met when you least expect it... This could happen with code that was previously tested and considered ok, but looses robustness only because it was compiled with a newer tool.
@Frank-Bemelman in my case, I have several units running jig tests at my workplace, cause I found these cute boards with a glued on oled display. On the other hand, the same boards run my amateur astronomy rig, a nice testing ground before I put code to a more serious board. This is how it crashed my astronomy rig, leading me to spend the christmas tracing the issue to exactly here, to this thread.
Long story short, from a PR perspective, this breaking change just casts a shadow onto the entire playground of esp32's. We are people, writing bad code in general. I am grateful the days of x86 assembly are over, and I can finally focus on writing stuff instead of counting the bits. And here we are, counting the bits, again.
Probably the reason that more people aren't blowing this thread up, is because it is extremely difficult to pin down the cause/source of this bug. Only programmers that have notable debugging skills will be able to figure out why their project bootloops "after I put that new library in"--as the ESP32 chips' crash debug outputs by themselves simply won't point you anywhere close to the right location.
This is how it crashed my astronomy rig, leading me to spend the christmas tracing the issue to exactly here, to this thread.
yup, it's not an easy bug to track down! Took me most of a day the first time, and more like a few hours the most recent time. It's the weirdest thing, when your subroutine mysteriously loops back on itself infinitely until the processor crashes due to WDT or stack fouling...
And it is a sort of a time bomb. I bet there are thousands of c functions in github projects just waiting for a compile and crash. People tend to stick to their setup as long as possible, but sooner or later they will upgrade for whatever reason, and then it happens. Like a melting ice cap.
In my case it was related to button presses and beeps, buttons being multiplexed cause not enough pins to begin with, in the meanwhile I read the tone function, previously a polyfill, now went into the os, into the multithreading, with message queues and stuff, ... I suspected ESDs initially, so went crazy by isolating the front panel with optos, read a ton of docs and forums about the ADC, cause maybe some code there, perhaps mixed in with the wifi, what about the led_c used to generate the pwm for the buzzer... just to be sure, methodically eliminating every possible culprit, installing and removing the 1.x, cross compiling with all versions to see the behaviors, only to find out that I had a missing return somewhere in my code, and that the 2.x codebase handled it differently than 1.x.
I stumbled about the exact same topicc(multiple times now).
How about -Werror=return-type
? This would point users of all professional level to the right location before the bootloop appears.