PX4-Devguide icon indicating copy to clipboard operation
PX4-Devguide copied to clipboard

Document HPWORK, LPWORK, scheduling priorities, practices to ensure software safety

Open ndepal opened this issue 8 years ago • 2 comments
trafficstars

The scheduling concepts used in PX4 should be documented and best practices and considerations should be explained. Getting this wrong can cause vehicle crashes, such as in https://github.com/PX4/Firmware/issues/7801, which even end up in stable releases.

On a wider scope, I think it would be extremely valuable if there were a chapter/tutorial on "how to make your application safe" where crucial considerations and tools are covered that help ensure new features do not cause problems down the road. This should include checking the stack with top, -fstack-usage, http://nuttx.org/doku.php?id=wiki:howtos:run-time-stack-checking, checking the increase in CPU caused by this feature, testing features thoroughly in HITL, and any other tools that I have not yet learned about.

ndepal avatar Aug 17 '17 14:08 ndepal

@davids5 any general NuttX stack sizing tips?

Notes about special cases like threads within tasks and hrt_call_every?

dagar avatar Aug 17 '17 14:08 dagar

@ndepal - Some background:

The PX4 build supports the run time stack checking as documented http://nuttx.org/doku.php?id=wiki:howtos:run-time-stack-checking (I posted the original document)

The trick is the whole system has to be built with the proper configuration (based on the value of nuttx defconfig:CONFIG_ARMV7M_STACKCHECK) or you will get false positives which are PANICS (hard faults).

The build use to ensure this was done. I have not built nor checked a CONFIG_ARMV7M_STACKCHECK=y build lately.

The overhead is high on CPU and FLASH. (we originally had it enabled on release builds)

  1. As you suggested, moving forward there should be a HIL (or resurrect the former Hans system CI) build with it enabled.

  2. @dagar for stack sizing - the possibility of a 'blow through' always exists. So HW stack checking is much more accurate. That said, when a function is called that defines a large amount of locals, the stack pointer can be yanked below the bottom of the stack. Then that function call another function and you have written below the task stack and it's 0xdeadbeef markers. 'blow through'! Stack monitoring therefore can not be accurate because it will see the 0xdeadbeef markers at the bottom of the stack and yet the cpu just wrote below it.

One way to get a reasonably valid stack size is to temporarily push the stack size up to as large as you can. say 4/8/16K, crazy big. Now the tricky part. Getting code coverage, exercise the new code and run top. The you can get a maximum penetration, assuming 'blow through' has not occurred because the stack size is so crazy big.

The other major thing to consider is the size of the interrupt stack. This is listed as the Idle stack size (or it use to be) in top. All interrupts like the HRT dispatched code execution (like all SPI based code) occurs on this build time sized and allocated stack. Looking at the link map will tell you what lies below it and is vulnerable. Again 'blow through' can happen on this stack as well.

davids5 avatar Aug 17 '17 20:08 davids5