cFE Consider bumping up the stack size in the sample startup script

Is your feature request related to a problem? Please describe. The startup script in the "out of the box" sample uses a stack size of 16k for all apps, e.g.:

https://github.com/nasa/cFE/blob/7a220ae809555cad86fb98d823ec77528a2fb125/cmake/sample_defs/cpu1_cfe_es_startup.scr#L3

I was recently debugging a segmentation fault issue on one of my app builds, and I finally thought to check the stack usage after realizing the backtrace had quite a few entries in it. After checking the stack usage, it turns out all the framework sample/lab apps are using about 14-15kB (that is, more than 80% of the allocated size).

So while it does not appear to be an issue in the default build, this leaves very little margin for additional development work before the stack will be exceeded, and its not obvious that the stack has exceeded once it does get too big.

Describe the solution you'd like Increase the sample script at least to 32k, or preferably 64k, on "sample_app" to have some additional room for development, because this is often used as a "sandbox" to test new ideas and concepts.

Describe alternatives you've considered At least document that the stack size is right at its margin (at least on on 64-bit Linux).

Additional context Credits to an simple and effective tool to gauge stack usage here: https://github.com/d99kris/stackusage ... this worked like a charm to measure how much stack usage all the CFE tasks were using.

In the future it might be nice to have features like that built into OSAL.

Requester Info Joseph Hickey, Vantage Systems, Inc.

Nov 09 '22 16:11 jphickey

I've had it on my TODO list forever to write an issue about including stack utilization in task reporting. Even just a rough walk through to find the first nonzero is better than nothing (or paint the stack for a better estimate). I've implemented something similar on point designs in the past and it's really useful. I've always thought cFS was really missing out on that margin/utilization visibility.

Nov 09 '22 16:11 skliper

Indeed, this is the way that "stackusage" tool that I cited works - it intercepts pthread_create and fills the stack with some sort of pattern and also registers an exit handler that checks where the pattern stops once the task exits. Simple but effective.

Somewhat surprised that GDB doesn't have something like that built in (or maybe it does but I just don't know?). I spent a fair bit of time here trying to determine what was wrong, there isn't really any direct indication you've blown through the stack, which is frustrating.

Nov 09 '22 17:11 jphickey

Adding milestone, really more from the perspective of resolving and implementing the related capability of stack utilization reporting mechanism (doesn't really have an independent issue yet, but should get added somewhere depending if we want it in ES task reporting or HS, and likely will add an API in PSP or OSAL).

Dec 19 '22 16:12 skliper

OP's experience checking stack size as a non-obvious last resort resonated with me. Some indication that the stack has been exceeded would be excellent. I have spent days stuck on debugging before thinking about checking the app stack limits.

Jan 10 '23 08:01 2shaar2059

RTEMS has https://docs.rtems.org/branches/master/c-user/stack_bounds_checker.html, although I haven't seen an API to just get the current info in a structure or similar. What were you doing w/ the sample apps to get such high usage? RTEMS on LEON3 reports the following for out of the box config:

stackuse
                             STACK USAGE BY THREAD
ID         NAME                  LOW        HIGH       CURRENT     AVAIL   USED
0x09010001 IDLE                  0x400df600 0x400e05ff 0x400e0368   4080    564
0x0a010001 UI1                   0x403918f0 0x403998ef 0x40399470  32752   3320
0x0a010002 BSWP                  0x403998f8 0x4039a8f7 0x4039a618   4080    948
0x0a010003 BRDA                  0x4039a900 0x4039b8ff 0x4039b630   4080    932
0x0a010004 ntwk                  0x4039b908 0x4039c907 0x4039c5d8   4080   1012
0x0a010005 ETH0                  0x4039c910 0x4039d90f 0x4039d578   4080   1116
0x0a010006 FTPa                  0x4039d9a0 0x4039f99f 0x4039f410   8176   1620
0x0a010007 FTPD                  0x4039fa30 0x403a0a2f 0x403a0620   4080   1236
0x0a010009 shel                  0x403a0ac0 0x403b0abf 0x403af960  65520   4452
0x0a01000a cFS                   0x403b0b50 0x403b4b4f 0x403b4718  16368   3996
0x0a01000b                       0x403b4b58 0x403b5b57 0x403b58d0   4080   1516
0x0a01000c                       0x403b5b60 0x403b6b5f 0x403b67c0   4080   1388
0x0a01000d                       0x403b6bf0 0x403b8bef 0x403b8620   8176   2596
0x0a01000e                       0x403b8f88 0x403baf87 0x403ba9b8   8176   3444
0x0a01000f                       0x403bb320 0x403bd31f 0x403bcc48   8176   3676
0x0a010010                       0x403bd4d8 0x403be4d7 0x403be010   4080   1580
0x0a010011                       0x403be568 0x403c0567 0x403bff98   8176   3108
0x0a010012                       0x403c05f8 0x403c15f7 0x403c1170   4080   1972
0x0a010013                       0x403c1688 0x403c3687 0x403c3200   8176   2356
0x0a010014                       0x403c3840 0x403c583f 0x403c5270   8176   2604
0x0a010015                       0x403c59f8 0x403c99f7 0x403c9428  16368   2604
0x0a010016                       0x403c9a88 0x403cda87 0x403cd4b0  16368   2620
0x0a010017                       0x403cde20 0x403d1e1f 0x403d19d0  16368   2780
0x0a010018                       0x403d1eb0 0x403d5eaf 0x403d5750  16368   2236
0x00000000 Interrupt Stack       0x400e0600 0x400e15ff 0x00000000   4080   1372

Jan 10 '23 15:01 skliper

Not the sample app, this was a custom app that did a lot of Matrix Multiplication with large (18x18) matrices of doubles.

Jan 10 '23 16:01 2shaar2059

Oh, I was thinking more for OP (@jphickey) where he was seeing high utilization for the sample app. Looks like the last 4 tasks have plenty of margin on RTEMS, but I wasn't doing anything fancy. Curious what utilization is like on Linux/VxWorks out of the box.

Jan 10 '23 21:01 skliper