Denial of Service in cFS when Receiving Malformed UDP Packets
Checklist (Please check before submitting)
- [x] I reviewed the Contributing Guide.
- [x] I performed a cursory search to see if the bug report is relevant, not redundant, nor in conflict with other tickets.
Describe the bug
When sending a large number of UDP command packets to the cFS system (specifically targeting the TO_LAB application), the system logs continuous "Pipe Overflow" errors such as:
CFE_SB 25: Pipe Overflow, MsgId 0x800, pipe TO_LAB_TLM_PIPE, sender CFE_ES
After such overflow errors, the cFS system becomes unresponsive to any further legitimate ground system commands, even though the core process (core-cpu1) remains alive. This appears to cause an unrecoverable internal state in the Software Bus (CFE_SB) and affects the behavior of multiple apps.
To Reproduce Steps to reproduce the behavior:
- Launch cFS normally (latest GitHub main branch, no modifications). run: ./ore-cpu1 2.run POC.py Execute the POC script and data package from the Google Drive link:(https://drive.google.com/file/d/10NGHPWHwtL9CmSRyI_8uaBNE7Q54mGp4/view?usp=sharing)
- Observe the command terminal message echo of core-cpu1.
error log :
` EVS Port1 1980-012-14:04:39.64021 66/1/CFE_SB 21: Send Err:Invalid MsgId(0x0)in msg,App CI_LAB_APP EVS Port1 1980-012-14:04:39.64021 66/1/CI_LAB_APP 10: CI_LAB: Ingest failed, status=-905969661
EVS Port1 1980-012-14:04:39.66004 66/1/CFE_SB 25: Pipe Overflow,MsgId 0x803,pipe TO_LAB_TLM_PIPE,sender CFE_SB EVS Port1 1980-012-14:04:39.67008 66/1/CFE_SB 25: Pipe Overflow,MsgId 0x805,pipe TO_LAB_TLM_PIPE,sender CFE_TIME EVS Port1 1980-012-14:04:39.69007 66/1/CFE_SB 25: Pipe Overflow,MsgId 0x800,pipe TO_LAB_TLM_PIPE,sender CFE_ES EVS Port1 1980-012-14:04:39.69010 66/1/CFE_SB 25: Pipe Overflow,MsgId 0x804,pipe TO_LAB_TLM_PIPE,sender CFE_TBL `
- Send normal commands using the CFS Ground System, and core-cpu1 will no longer receive them.
- Notice multiple Pipe Overflow errors and that the system no longer responds to normal NOOP or any other commands. Expected behavior Even in the case of Pipe Overflow, cFS should gracefully handle the overflow (e.g., discard new incoming packets but allow ongoing normal command processing). Instead, after overflow occurs, the system enters an inconsistent state where no further valid commands are processed.
Code snips No local source code modifications were made. Test performed directly on the latest GitHub version.
System observed on:
- Hardware
- x86_64 PC (Laptop, Intel CPU)
- OS: ubuntu
- Versions Versions:
cFE: equuleus-rc1+dev247
OSAL: equuleus-rc1+dev117
PSP: equuleus-rc1+dev73
Applications: TO_LAB, CI_LAB, SAMPLE_APP, etc.
Additional context 1.UDP port used: 1234 2.Normal NOOP commands succeed before overflow occurs. 3.After overflow, the system logs many Pipe Overflow messages and fails to process legitimate commands. 4.The reproduction uses fuzzed (but validly formed) UDP messages based on typical cFS TO_LAB message structure.
Reporter Info Lidong LI & XiaoZheng Ding@SourceGuard
Thanks for flagging this, we'll take a look. Do you have a sense of whether it is the pipe or CI_LAB that stops responding? If the former, we'll get it fixed. If the latter, CI_LAB isn't meant for flight and therefore isn't (and doesn't need to be) robust against off-nominal situations.
Thanks for this report.
I have examined the sequence of packets being sent to the UDP command port (CI_LAB) and while most are invalid and should be rejected, some do alias valid commands and will be interpreted as such.
In particular the current version of CI_LAB has a "scheduled" mode of operation, see here: https://github.com/nasa/ci_lab/blob/8c43b9eb45f2fe9ee5a8069cf07b2b45e3669971/fsw/src/ci_lab_app.c#L78
For backward compatibility, by default, CI_LAB starts up with this mode turned off, and thus will service the uplink after every command timeout. However upon the first receipt of a scheduled "read uplink" indication it turns this mode on, and in that mode it will only service the uplink when it specifically gets the "read uplink" indication.
In the default configuration of CI_LAB, the READ_UPLINK command has a message ID of 0x1886. The 9th packet in the sequence sends to this message ID and thus invokes this mode of operation. In my testing, this is why CI_LAB stops servicing the command ingest socket If I go in with a debugger and force the Scheduled flag back to false, it resumes command processing.