Arduino
Arduino copied to clipboard
Sketch halts when using std::string and map
Basic Infos
- [ x] This issue complies with the issue POLICY doc.
- [ x] I have read the documentation at readthedocs and the issue is not addressed there.
- [x ] I have tested that the issue is present in current master branch (aka latest git).
- [ x] I have searched the issue tracker for a similar issue.
- [x ] If there is a stack dump, I have decoded it.
Platform
- Hardware: ESP-12
- Core Version: https://github.com/esp8266/Arduino/commit/8f71d2c04225c4b3af3fc8407ff74c2481e3904c
- Development Env: Arduino IDE
- Operating System: Windows
Description
I was migrating a project to esp8266-core 3.0.x when I started to hit weird execution issues (i.e. the execution suddenly halts until the watchdog triggers to restart the esp8266).
After very frustrating moments, I isolated the issue to the following MCVE, which basically it doesn't print the message ciao
on the serial port. The same behavior can be observed on all the versions 3.0.x and on the latest git version (https://github.com/esp8266/Arduino/commit/8f71d2c04225c4b3af3fc8407ff74c2481e3904c).
After looking at possible related issues, I found #8314, but I don't know if they have a solution for that issue...
This code works with v2.7.4 and previous versions.
If you reduce the size of the hashmap by removing, for example, 15 pairs, the Serial.print() works.
#include <unordered_map>
#include <string>
#include <Arduino.h>
void setup() {
Serial.begin(115200);
Serial.println("Ciao");
}
# note that it is never used
std::unordered_map<std::string, unsigned short> noteMapping{
{"SILENCE", 0 },
{ "B0", 31 },
{ "C1", 33 },
{ "CS1", 35 },
{ "D1", 37 },
{ "DS1", 39 },
{ "D1", 37 },
{ "DS1", 39 },
{ "E1", 41 },
{ "F1", 44 },
{ "FS1", 46 },
{ "G1", 49 },
{ "GS1", 52 },
{ "A1", 55 },
{ "AS1", 58 },
{ "B1", 62 },
{ "C2", 65 },
{ "CS2", 69 },
{ "D2", 73 },
{ "DS2", 78 },
{ "E2", 82 },
{ "F2", 87 },
{ "FS2", 93 },
{ "G2", 98 },
{ "GS2", 104 },
{ "A2", 110 },
{ "AS2", 117 },
{ "B2", 123 },
{ "C3", 131 },
{ "CS3", 139 },
{ "D3", 147 },
{ "DS3", 156 },
{ "E3", 165 },
{ "F3", 175 },
{ "FS3", 185 },
{ "G3", 196 },
{ "GS3", 208 },
{ "A3", 220 },
{ "AS3", 233 },
{ "B3", 247 },
{ "C4", 262 },
{ "CS4", 277 },
{ "D4", 294 },
{ "DS4", 311 },
{ "E4", 330 },
{ "F4", 349 },
{ "FS4", 370 },
{ "G4", 392 },
{ "GS4", 415 },
{ "A4", 440 },
{ "AS4", 466 },
{ "B4", 494 },
{ "C5", 523 },
{ "CS5", 554 },
{ "D5", 587 },
{ "DS5", 622 },
{ "E5", 659 },
{ "F5", 698 },
{ "FS5", 740 },
{ "G5", 784 },
{ "GS5", 831 },
{ "A5", 880 },
{ "AS5", 932 },
{ "B5", 988 },
{ "C6", 1047},
{ "CS6", 1109},
{ "D6", 1175},
{ "DS6", 1245},
{ "E6", 1319},
{ "F6", 1397},
{ "FS6", 1480},
{ "G6", 1568},
{ "GS6", 1661},
{ "A6", 1760},
{ "AS6", 1865},
{ "B6", 1976},
{ "C7", 2093},
{ "CS7", 2217},
{ "D7", 2349},
{ "DS7", 2489},
{ "E7", 2637},
{ "F7", 2794},
{ "FS7", 2960},
{ "G7", 3136},
{ "GS7", 3322},
{ "A7", 3520},
{ "AS7", 3729},
{ "B7", 3951},
{ "C8", 4186},
{ "CS8", 4435},
{ "D8", 4699},
{ "DS8", 4978}
};
void loop() {}
Does it work after adding disable_extra4k_at_link_time();
in the setup();
?
Yes it works! I didn't hear about this function!
I see that disable_extra4k_at_link_time()
was introduced after the release of v3.0.2. Is it a workaround or will it be remain in public api?
Is it related to the linked issue? May you tell me which is the cause of this issue?
cc @mhightower83 may have a better explanation
it is a pretty old api, not something new to the 3.x.x release. edit: we allocate 4kib structure for our 'user task' stack, either in system stack (default) or via static variable (no-extra4k). so something goes wrong there, i'd guess, since the issue only manifests when the map grows
only fact I am sure of atm, we grab 5KiB 6872 bytes of heap to store strings and storage used by the unordered map when we init global ctors.
(which btw may be suboptimal container(s) for what you are trying to do)
it is a pretty old api, not something new to the 3.x.x release.
Yes, I was wrong, there where few modifications since v3.0.2 around this function but it is pretty old, sorry.
However, this piece of code is part of a library, I don't like to oblige the user to add this function to their setup()
, so I would exclude the usage of this function.
may be suboptimal container(s) for what you are trying to do
I know, but it works fine enough, but if it leads to this behavior I will change for the better.
Adding it to setup() simply references it for the linker, which is enough to trigger the no-extra4k mode. Your library might as well do that somewhere in it's code, setup() is not requirement for this workaround.
I know, but it works fine enough, but if it leads to this behavior I will change for the better.
Just consider that you execute a lot of useless initialization code doing it that way. Be it a desktop app, perhaps that would 've been fine :) e.g. compare building this unordered_map example with a local g++ -Os -S
and see the resulting .s
file
Keeping the existing strings + u16, at least one way is to do this (pretty sure esp32 also supports c++17 features)
#include <string_view>
#include <array>
using Note = std::pair<std::string_view, unsigned short>;
static constexpr std::array<Note, 92> noteMapping {{
...
}};
Just 92 entries with a linear search instead of hashing should be a reasonable trade-off? With a further opportunity to use flash to store both strings and the structure itself, reducing RAM impact with a trade-off of writing code to access it in a special way
The origin of disable_extra4k_at_link_time()
goes back a while. This is the quickest description I can find:
https://github.com/esp8266/Arduino/blob/8f71d2c04225c4b3af3fc8407ff74c2481e3904c/cores/esp8266/core_esp8266_main.cpp#L321-L356
It is not apparent how reducing the heap and increasing the system stack would help this problem. WPS and WiFi Enterprise are the only features I know of, so far, that need disable_extra4k_at_link_time()
.
Edit: To be clear, I do see the problem occurring.
I have not tried anything originally, just a guess about the nature of the used structure and how much memory it grabs from the get-go. Perhaps it is nothing, and totally unrelated to the SYS stack grab.
The question is mainly, what happens on initialization?
With the above, there is a proxy initializer-list with a bunch of pairs that are moved into the mapping.
(ref. disassemble _Z41__static_initialization_and_destruction_0ii$constprop$0
, if I understood correctly at what place that hashtable aka unordered_map ctor happens)
However, using a separate const-array and passing begin(), end() to map ctor still works just fine.
Thanks both for the detailed answers and suggestions.
@mcspr about string_view, I would preserve the compatibility with esp8266 <v3.0.0, which use an old version of gcc and it to not support c++17, so no support to string_view. I will use std::array, and maybe, arduino String instead of std::string. Are there any performance advantages between using Arduino String vs std::string?
@fabianoriccardi probably not. Same problem as before, though. It allocates some structure, somewhere, and it there for the duration of the program copied from char data already loaded in RAM. Simple const char* might suffice.
btw even gcc4.8.2 (from our previous version 2.7.4) will still support a simple wrapper around string literal https://godbolt.org/z/8dav8P1rc (although older standard also means there cannot be more than one expression on return)
or, using the same approach as in the linked issue with a template based on size. pretty common as well https://github.com/ETLCPP/etl/blob/752d9adb5c2978e7e21c39a19d7297dc88f24e5c/include/etl/string_view.h#L811
just curious, quick n' dirty trial... the threshold is between "B7" and "C8" (binary-search by hand), and including "C8" goes WDT timeout. (using x86_64-w64-mingw32.xtensa-lx106-elf-aff7f1b.211127.zip)
My current understanding of stacks sys and cont and neighbors
ETS system data RAM starts at 0x3FFFC000
and is 16K bytes (0x4000
). The first 8K, 0x3FFFC000
- 0x3FFFE000
, is used by the Boot ROM for data and BSS. This 8K block continues to be used after the system has finished booting. Some Boot ROM Services continue to be used by the NONOS SDK and some user libraries. The remaining 8K, 0x3FFFE000
- 0x40000000
is mostly used for the stacks.
The Boot ROM functions that support WPS and aes_unwrap()
have buffers toward the beginning of this address space, 0x3FFFE000
. By not using WPS and replacing aes_unwrap()
with a version that uses malloc()
we have a full 8K byte space to support the system (sys) and user (cont) stack, approximately 4K for each. The system stack is roughly in the 0x3FFFE000
- 0x3FFFF000
address space and the user stack in 0x3FFFF000
- 0x40000000
.
To use WPS, the user stack needs to be moved out of this 8K space, allowing the system stack to move up closer to 0x40000000
out of the way of the WPS hardcoded buffers. Placing a call to disable_extra4k_at_link_time()
anywhere in your sketch does this. In this configuration, the system stack is about 6K bytes. If you are not using WPS, you get the whole 8K for the system stack. This comes at the expense of 4K of heap space. The user stack is defined in lower user data memory. The heap size is the resulting unused memory below 0x3FFFC000
.
When the system stack grows down below 0x3FFFE000
, you may see an HWDT or other weird crashes.
When WPS is used without disable_extra4k_at_link_time()
, you may see an HWDT or other weird crashes.
Using the 8K system stack model to take stack painting and check measurements without crashing
Assuming a 4K system stack, it looks like when do_global_ctors()
is called from init_done()
, it has 1376 bytes of the system stack already in use. That leaves only 2720 to handle do_global_ctors()
. The full MCVE needs 3324 to initialize. A total of 4700 bytes.
Edited: add clarification
interesting information... it feels initializations via global ctors imply potentially dangerous.
can't do_global_ctors()
's stack be assigned to cont one, or to another large enough block?
Or, if not that, could we implement something similar to stack protection check from gcc applied to global ctors function? At least for error reporting in more apparent manner than just triggering hwdt.
Or, if not that, could we implement something similar to stack protection check from gcc applied to global ctors function? At least for error reporting in more apparent manner than just triggering hwdt.
I think it would be super-useful for any other unlucky guy hitting this sneaky issue!
btw even gcc4.8.2 (from our previous version 2.7.4) will still support a simple wrapper around string literal https://godbolt.org/z/8dav8P1rc
Thanks for the suggestion, I tested it and it works perfectly on older and newer versions. I will go for it!
can't
do_global_ctors()
's stack be assigned to cont one, or to another large enough block?
--- cores/esp8266/core_esp8266_main.cpp
+++ cores/esp8266/core_esp8266_main.cpp
@@ -305,9 +305,12 @@ void init_done() {
system_set_os_print(1);
gdb_init();
std::set_terminate(__unhandled_exception_cpp);
- do_global_ctors();
- esp_schedule();
+ ESP.resetHeap();
+ cont_run(g_pcont, &do_global_ctors);
ESP.setDramHeap();
+ if (cont_check(g_pcont) != 0)
+ panic();
+ esp_schedule();
}
/* This is the entry point of the application.
seems to resolve the problem. (but other side effects are unknown, any ideas?)
I think there are a few bumps.
At init_done()
there is no need for ESP.resetHeap()
. There is nothing to restore and would panic when multiple heaps are available. Also, with the call to ESP.resetHeap()
, you are calling a class that has not (yet) been initialized.
It is a little strange. When cont_run()
returns, the stack we are using may have been corrupted by the call to do_global_ctors()
; however, (if?) we do not need any of our local stack content, a direct call to postmortem can be made with enough stack space to process the panic.
Unfortunately, nothing will print. All the GPIO pins were reset during user_init()
. The TX pin was turned off.
One workaround would be to add a replacement resetPins()
extern "C" void resetPins() {
for (int pin = 0; pin <= 16; ++pin) {
if (1 == pin) {
continue;
}
if (!isFlashInterfacePin(pin))
pinMode(pin, INPUT);
}
}
to the sketch to keep the TX pin alive. Note, that the UART speed will default to 115200bps.
As long as the overflow doesn't exceed 8K, a configuration could be defined to detect that initializing global ctors overflowed the stack.
Edited to supply an alternative to get printf working and apply suggested changes (edited again to correct ==
to !=
)
void init_done() {
system_set_os_print(1);
gdb_init();
std::set_terminate(__unhandled_exception_cpp);
cont_run(g_pcont, &do_global_ctors);
if (cont_check(g_pcont) != 0) {
pinMode(1, SPECIAL);
panic();
}
esp_schedule();
ESP.setDramHeap();
}
I am not sure about the adverse side effects either. It is not clear to me how to create a proper solution. I assume it is inappropriate Arduino behavior to force serial on when that is not part of the sketch. Pin 1 could be wired up for some other function.