Consider bumping up the stack size in the sample startup script
Is your feature request related to a problem? Please describe. The startup script in the "out of the box" sample uses a stack size of 16k for all apps, e.g.:
https://github.com/nasa/cFE/blob/7a220ae809555cad86fb98d823ec77528a2fb125/cmake/sample_defs/cpu1_cfe_es_startup.scr#L3
I was recently debugging a segmentation fault issue on one of my app builds, and I finally thought to check the stack usage after realizing the backtrace had quite a few entries in it. After checking the stack usage, it turns out all the framework sample/lab apps are using about 14-15kB (that is, more than 80% of the allocated size).
So while it does not appear to be an issue in the default build, this leaves very little margin for additional development work before the stack will be exceeded, and its not obvious that the stack has exceeded once it does get too big.
Describe the solution you'd like Increase the sample script at least to 32k, or preferably 64k, on "sample_app" to have some additional room for development, because this is often used as a "sandbox" to test new ideas and concepts.
Describe alternatives you've considered At least document that the stack size is right at its margin (at least on on 64-bit Linux).
Additional context Credits to an simple and effective tool to gauge stack usage here: https://github.com/d99kris/stackusage ... this worked like a charm to measure how much stack usage all the CFE tasks were using.
In the future it might be nice to have features like that built into OSAL.
Requester Info Joseph Hickey, Vantage Systems, Inc.
I've had it on my TODO list forever to write an issue about including stack utilization in task reporting. Even just a rough walk through to find the first nonzero is better than nothing (or paint the stack for a better estimate). I've implemented something similar on point designs in the past and it's really useful. I've always thought cFS was really missing out on that margin/utilization visibility.
Indeed, this is the way that "stackusage" tool that I cited works - it intercepts pthread_create and fills the stack with some sort of pattern and also registers an exit handler that checks where the pattern stops once the task exits. Simple but effective.
Somewhat surprised that GDB doesn't have something like that built in (or maybe it does but I just don't know?). I spent a fair bit of time here trying to determine what was wrong, there isn't really any direct indication you've blown through the stack, which is frustrating.
Adding milestone, really more from the perspective of resolving and implementing the related capability of stack utilization reporting mechanism (doesn't really have an independent issue yet, but should get added somewhere depending if we want it in ES task reporting or HS, and likely will add an API in PSP or OSAL).
OP's experience checking stack size as a non-obvious last resort resonated with me. Some indication that the stack has been exceeded would be excellent. I have spent days stuck on debugging before thinking about checking the app stack limits.
RTEMS has https://docs.rtems.org/branches/master/c-user/stack_bounds_checker.html, although I haven't seen an API to just get the current info in a structure or similar. What were you doing w/ the sample apps to get such high usage? RTEMS on LEON3 reports the following for out of the box config:
stackuse
STACK USAGE BY THREAD
ID NAME LOW HIGH CURRENT AVAIL USED
0x09010001 IDLE 0x400df600 0x400e05ff 0x400e0368 4080 564
0x0a010001 UI1 0x403918f0 0x403998ef 0x40399470 32752 3320
0x0a010002 BSWP 0x403998f8 0x4039a8f7 0x4039a618 4080 948
0x0a010003 BRDA 0x4039a900 0x4039b8ff 0x4039b630 4080 932
0x0a010004 ntwk 0x4039b908 0x4039c907 0x4039c5d8 4080 1012
0x0a010005 ETH0 0x4039c910 0x4039d90f 0x4039d578 4080 1116
0x0a010006 FTPa 0x4039d9a0 0x4039f99f 0x4039f410 8176 1620
0x0a010007 FTPD 0x4039fa30 0x403a0a2f 0x403a0620 4080 1236
0x0a010009 shel 0x403a0ac0 0x403b0abf 0x403af960 65520 4452
0x0a01000a cFS 0x403b0b50 0x403b4b4f 0x403b4718 16368 3996
0x0a01000b 0x403b4b58 0x403b5b57 0x403b58d0 4080 1516
0x0a01000c 0x403b5b60 0x403b6b5f 0x403b67c0 4080 1388
0x0a01000d 0x403b6bf0 0x403b8bef 0x403b8620 8176 2596
0x0a01000e 0x403b8f88 0x403baf87 0x403ba9b8 8176 3444
0x0a01000f 0x403bb320 0x403bd31f 0x403bcc48 8176 3676
0x0a010010 0x403bd4d8 0x403be4d7 0x403be010 4080 1580
0x0a010011 0x403be568 0x403c0567 0x403bff98 8176 3108
0x0a010012 0x403c05f8 0x403c15f7 0x403c1170 4080 1972
0x0a010013 0x403c1688 0x403c3687 0x403c3200 8176 2356
0x0a010014 0x403c3840 0x403c583f 0x403c5270 8176 2604
0x0a010015 0x403c59f8 0x403c99f7 0x403c9428 16368 2604
0x0a010016 0x403c9a88 0x403cda87 0x403cd4b0 16368 2620
0x0a010017 0x403cde20 0x403d1e1f 0x403d19d0 16368 2780
0x0a010018 0x403d1eb0 0x403d5eaf 0x403d5750 16368 2236
0x00000000 Interrupt Stack 0x400e0600 0x400e15ff 0x00000000 4080 1372
Not the sample app, this was a custom app that did a lot of Matrix Multiplication with large (18x18) matrices of doubles.
Oh, I was thinking more for OP (@jphickey) where he was seeing high utilization for the sample app. Looks like the last 4 tasks have plenty of margin on RTEMS, but I wasn't doing anything fancy. Curious what utilization is like on Linux/VxWorks out of the box.