nuttx
nuttx copied to clipboard
boards/esp32: Increase init task stack size to 3072
Summary
- boards/esp32: Increase init task stack size to 3072
This is done to avoid casual stack overflows. Especially after #11777, that increased stack usage.
Impact
Avoid stack overflow.
Testing
Internal CI testing + ESP32-DevKitC V4
@tmedicci do you know how much PR #11777 increased the stack usage? Just looking that PR was impossible to know this side effect. I think we need some CI script to compare and show how much each PR are impacting in flash/RAM size and stack usage. @anchao any idea why your PR is increasing the stack?
could we hold 2KB default stack size? or we have to increase the default task stack size, instead updating every defconfig.
if there is no hard requirement please leave the default stack size 2k
@tmedicci do you know how much PR #11777 increased the stack usage? Just looking that PR was impossible to know this side effect. I think we need some CI script to compare and show how much each PR are impacting in flash/RAM size and stack usage. @anchao any idea why your PR is increasing the stack?
I didn't measure it directly, but it seems to be very reasonable that it would have been increased by the size of struct task_info_s
.
The problem itself appears when running the built-in ps
command in the NuttShell.
could we hold 2KB default stack size? or we have to increase the default task stack size, instead updating every defconfig.
Usually, 2KB is sufficient for most of the applications. The problem is specific to the init task, which is the nsh
and its built-in commands. I prefer not to increase the default stack size...
@tmedicci @xiaoxiang781216 if ps
is failing after that PR then for some reason is increased the stack size. Espressif has an internal CI that does some hardware tests. I think having a low stack value defined in the config is useful to catch issues like this.
@tmedicci do you know how much PR #11777 increased the stack usage? Just looking that PR was impossible to know this side effect. I think we need some CI script to compare and show how much each PR are impacting in flash/RAM size and stack usage. @anchao any idea why your PR is increasing the stack?
I didn't measure it directly, but it seems to be very reasonable that it would have been increased by the size of
struct task_info_s
.The problem itself appears when running the built-in
ps
command in the NuttShell.could we hold 2KB default stack size? or we have to increase the default task stack size, instead updating every defconfig.
Usually, 2KB is sufficient for most of the applications. The problem is specific to the init task, which is the
nsh
and its built-in commands. I prefer not to increase the default stack size...
If so, it's better to change the default stack size for nsh, the default value should be suitable for the common usage. Newbie will surprise that ps in nsh will panic for many bultin support board.
@tmedicci do you know how much PR #11777 increased the stack usage? Just looking that PR was impossible to know this side effect. I think we need some CI script to compare and show how much each PR are impacting in flash/RAM size and stack usage. @anchao any idea why your PR is increasing the stack?
I didn't measure it directly, but it seems to be very reasonable that it would have been increased by the size of
struct task_info_s
. The problem itself appears when running the built-inps
command in the NuttShell.could we hold 2KB default stack size? or we have to increase the default task stack size, instead updating every defconfig.
Usually, 2KB is sufficient for most of the applications. The problem is specific to the init task, which is the
nsh
and its built-in commands. I prefer not to increase the default stack size...If so, it's better to change the default stack size for nsh, the default value should be suitable for the common usage. Newbie will surprise that ps in nsh will panic for many bultin support board.
Stack usage is specific for each arch (there are minor changes about it depending on the arch and selected configs). One could change nsh
stack size directly, but when CONFIG_INIT_ENTRYPOINT="nsh_main"
(the most common), it would be overridden by CONFIG_INIT_STACKSIZE
(which is the one I changed).
@tmedicci @xiaoxiang781216 if
ps
is failing after that PR then for some reason is increased the stack size. Espressif has an internal CI that does some hardware tests. I think having a low stack value defined in the config is useful to catch issues like this.
Yes, it is... but even before this implementation, stack usage was near its limits (it even failed for specific defconfigs that had their CONFIG_INIT_STACKSIZE
increased before this PR). Our internal CI ensures that Espressif's devices can boot and run simple commands (like ps
and free
), but we can not ensure this is valid for all other chips. We can think on a tool to evaluate this when submitting new PRs, but I don't think it fits this PR's purposes.
emm... strange, PR #11777 just move the instance of struct task_info_s
into struct task_group_s
:
struct task_group_s
{
...
/* Thread local storage ***************************************************/
#ifndef CONFIG_MM_KERNEL_HEAP
struct task_info_s tg_info_;
#endif
FAR struct task_info_s *tg_info;
};
The instantiation of struct task_group_s
comes from heap
(malloc), and it does not make temporary declarations in the stack:
int group_allocate(FAR struct task_tcb_s *tcb, uint8_t ttype)
{
FAR struct task_group_s *group;
...
/* Allocate the group structure and assign it to the TCB */
group = kmm_zalloc(sizeof(struct task_group_s));
if (!group)
{
return -ENOMEM;
}
...
}
In addition, I checked on the sim. It seems the stack of nsh
has not changed. does esp board will declare struct task_group_s
in the stack directly?
Before patch #11777:
NuttShell (NSH) NuttX-10.4.0
MOTD: username=admin password=Administrator
nsh> ps
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 FIFO Kthread - Ready 0000000000000000 069616 004632 6.6% Idle_Task
1 1 224 FIFO Kthread - Waiting Signal 0000000000000000 067536 004728 7.0% loop_task
2 2 224 FIFO Kthread - Waiting Semaphore 0000000000000000 067504 000888 1.3% hpwork 0x5019e0 0x501a08
3 3 100 FIFO Task - Running 0000000000000000 067536 003544 5.2% nsh_main
After patch #11777:
NuttShell (NSH) NuttX-10.4.0
MOTD: username=admin password=Administrator
nsh> ps
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 FIFO Kthread - Ready 0000000000000000 069616 004632 6.6% Idle_Task
1 1 224 FIFO Kthread - Waiting Signal 0000000000000000 067536 004680 6.9% loop_task
2 2 224 FIFO Kthread - Waiting Semaphore 0000000000000000 067504 000888 1.3% hpwork 0x501d00 0x501d28
3 3 100 FIFO Task - Running 0000000000000000 067536 003544 5.2% nsh_main
@anchao thank you for verifying! @tmedicci please try to verify before and after this patch, maybe it could be caused by some other commit.
@anchao thank you for verifying! @tmedicci please try to verify before and after this patch, maybe it could be caused by some other commit.
I found this issue through bisect and this was the specific commit that triggered the exception, but what @anchao said makes sense: it does not seem to increase the stack usage (I'm sorry for stating that before). The point is (I've said earlier) that we already had occasional exceptions and increased the init task's stack size on ESP32-S2, ESP32-S3, and even on some of the ESP32 boards. Building it with CONFIG_STACK_COLORATION
shows a stack usage of 100% (or very close to that), before and after the commit. Probably, this commit just triggered the issue because memory nearby wasn't being used and the stack overflow did not trigger the exception, which is not true anymore. The init task's stack size still needs to be increased...
@tmedicci you can try a new tool here: https://github.com/apache/nuttx/blob/master/sched/Kconfig#L1306-L1313 which could catch the deepest callstack automatically, not only the max stack usage.
@tmedicci you can try a new tool here: https://github.com/apache/nuttx/blob/master/sched/Kconfig#L1306-L1313 which could catch the deepest callstack automatically, not only the max stack usage.
Thanks for the tip, @xiaoxiang781216 !