nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

[BUG] RP2040 Unstability, garbled NSH

Open keever50 opened this issue 8 months ago • 18 comments

Description / Steps to reproduce the issue

This is a corrected continuation of https://github.com/apache/nuttx/issues/16139 And also happening on the nuttx-app side https://github.com/apache/nuttx-apps/issues/3063 And possible links to all the RP23xx issues.

As the firmware becomes more complex with more threads and increased memory usage, the NSH shell output gets progressively more garbled. This is most noticeable when running commands like help or ps, but it can also happen when using printf in applications. The issue shows up in both usbnsh and nsh configurations, especially when additional drivers and random applications are added.

In worse cases, this corruption can even cause the system to hang, including during ostest. Increasing the stack size does not solve the issue.

I previously mentioned that USB might be related, but that's not the case. This problem is more widespread.

I’m sharing this to raise awareness, since I haven’t been able to track it down myself. I’ve spent many hours trying, but nothing improved. The issue also has a tendency to hide and come back later, making it even harder to pin down.

I hope we can find the problem in this github issue.

On which OS does this issue occur?

[OS: Other], [OS: Linux]

What is the version of your OS?

Arch

NuttX Version

Master

Issue Architecture

[Arch: arm]

Issue Area

[Area: Kernel]

Host information

No response

Verification

  • [x] I have verified before submitting the report.

keever50 avatar May 02 '25 15:05 keever50

Thank you for reporting this!

Up to now I only ever used the rp2040 board with nuttx. Thus, I somehow wrongly assumed, that the garbled output is a general (cosmetic) issue. Thank you for clarifying this.

In apache/nuttx-apps#3063 I described, that the issue disappeared, after I slowed down the output (write to the interface) significantly. I guess, this means, that the write function of the serial interface somehow does not behave as "blocking": the receiver discards incoming data in case of a full buffer - instead of letting the caller wait until the buffer is drained.

At least this is my speculation regarding the cause of the issue.

sumpfralle avatar May 04 '25 02:05 sumpfralle

I see that @YuuichiNakamura wrote the USB support for the RP2040 in rp2040_usbdev.c, maybe they can suggest some possible reasons for this issue to help us narrow down our search?

linguini1 avatar May 08 '25 02:05 linguini1

I wonder also if there is anyone with a non-RP2040 board that has USB who can test the shell output for garbling. This would narrow it down to the RP2040 driver or a higher level (CDCACM) driver.

linguini1 avatar May 08 '25 02:05 linguini1

I wonder also if there is anyone with a non-RP2040 board that has USB who can test the shell output for garbling. This would narrow it down to the RP2040 driver or a higher level (CDCACM) driver.

the funny part is that hardware UART seems to sometimes do the same thing.

"I previously mentioned that USB might be related, but that's not the case. This problem is more widespread."

But yes, USB is way worse. Way worse.

keever50 avatar May 12 '25 16:05 keever50

the funny part is that hardware UART seems to sometimes do the same thing.

This is good information to know. I should also mention that when using the STM32H743 UART shell, I also observe the strange line endings of something like "[K" in my terminal emulator. I haven't noticed any garbled output yet, but that strange character might be a symptom of the higher level UART driver?

I wonder if there's any similar MCUs to the RP2040 (ARM0 and similar compute power) that we can try to achieve garbling on to see if this is exclusive to the RP2040 implementation? I have a Seeduino I can try USB and UART on.

linguini1 avatar May 12 '25 17:05 linguini1

This is the character I always see, across multiple boards, when using the miniterm Python module. I haven't seen this when using Minicom.

Image

linguini1 avatar May 13 '25 00:05 linguini1

This is 1b 5b 4b in hex, which is for some reason being sent. Seems to happen specifically after the nsh> prompt and when I press backspace.

linguini1 avatar May 13 '25 01:05 linguini1

~~I've noticed that nsh_update_prompt() get's called every time I hit enter, which seems incorrect to me since it's only called during initialization and nsh_session(). I don't think a new session should be made every time I hit enter in the terminal but maybe I'm missing something.~~ (I see it's called in the update loop now) Wondering if NSH is responsible for some of this garbling?

@keever50 are you able to confirm if you experience the garbling in console output when not using NSH? I.e. running your application directly and printing to the USB or UART serial console? Can you also share your configuration where you observe this issue, and especially the one where OSTest fails?

linguini1 avatar May 13 '25 02:05 linguini1

It gets much worse when I enable the command line editor mode instead of just readline:

Image

I think these are escape codes from the research I did online, maybe the documentation could suggest some ANSI escape code terminal software? I suppose miniterm does not handle that on its own.

linguini1 avatar May 13 '25 02:05 linguini1

I can confirm it is also an issue during printf use. Help command is often broken in bigger configs. This is something you can see in my testlogs I add to PRs

Op di 13 mei 2025 04:06 schreef Matteo Golin @.***>:

linguini1 left a comment (apache/nuttx#16305) https://github.com/apache/nuttx/issues/16305#issuecomment-2874811083

I've noticed that nsh_update_prompt() get's called every time I hit enter, which seems incorrect to me since it's only called during initialization and nsh_session(). I don't think a new session should be made every time I hit enter in the terminal but maybe I'm missing something. Wondering if NSH is responsible for some of this garbling?

@keever50 https://github.com/keever50 are you able to confirm if you experience the garbling in console output when not using NSH? I.e. running your application directly and printing to the USB or UART serial console?

— Reply to this email directly, view it on GitHub https://github.com/apache/nuttx/issues/16305#issuecomment-2874811083, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJ3DKG3MFCFYYOUMGQS4F326FHQ5AVCNFSM6AAAAAB4KKFFHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNZUHAYTCMBYGM . You are receiving this because you were mentioned.Message ID: @.***>

keever50 avatar May 13 '25 07:05 keever50

~~I've noticed that nsh_update_prompt() get's called every time I hit enter, which seems incorrect to me since it's only called during initialization and nsh_session(). I don't think a new session should be made every time I hit enter in the terminal but maybe I'm missing something.~~ (I see it's called in the update loop now) Wondering if NSH is responsible for some of this garbling?

@keever50 are you able to confirm if you experience the garbling in console output when not using NSH? I.e. running your application directly and printing to the USB or UART serial console? Can you also share your configuration where you observe this issue, and especially the one where OSTest fails?

Ill share config later. Its easy to create one by enabling random things.

keever50 avatar May 13 '25 07:05 keever50

@linguini1 this is definitely issue with your terminal, not NuttX. Probably this is the reason: https://forum.micropython.org/viewtopic.php?t=2502

I suggest using a proven terminal emulator like minicom :)

raiden00pl avatar May 14 '25 06:05 raiden00pl

@linguini1 this is definitely issue with your terminal, not NuttX. Probably this is the reason: https://forum.micropython.org/viewtopic.php?t=2502

I suggest using a proven terminal emulator like minicom :)

Yeah, it seems like the serial terminal I was using does not support escape codes. I may write this in the docs somewhere as a note, a few other people I work with were wondering about this.

I will use minicom to continue to diagnose the garbled output!

linguini1 avatar May 14 '25 12:05 linguini1

I've found today that sending syslog output to a USB console causes lots of issues, including some crashes. Enabling buffered syslog output prevents some garbling. Will look into whether or not this may be related.

linguini1 avatar May 25 '25 22:05 linguini1

~I've noticed that nsh_update_prompt() get's called every time I hit enter, which seems incorrect to me since it's only called during initialization and nsh_session(). I don't think a new session should be made every time I hit enter in the terminal but maybe I'm missing something.~ (I see it's called in the update loop now) Wondering if NSH is responsible for some of this garbling?

@keever50 are you able to confirm if you experience the garbling in console output when not using NSH? I.e. running your application directly and printing to the USB or UART serial console? Can you also share your configuration where you observe this issue, and especially the one where OSTest fails?

I'm unable to replicate this issue over UART anymore (hardware, not usb acm cdc). But it still is very much a huge problem when using USB ACM CDC. To make the output absolutely unreadable, do the following:

An easy way to test this problem is to enable CPU heavy operations, such as KASAN, granular allocator and enable ALL debug logging. This stresses the serial a lot. This has been my go-to test for these kind of things.

I've increased stacksize, which might have helped the HW UART side, but not the USB ACM CDC. Also here, just like most other platforms, the stacksize is very low.

So, HW UART looks like it is fine now. USB ACM CDC not fine at all.

keever50 avatar Jun 02 '25 09:06 keever50

An easy way to test this problem is to enable CPU heavy operations, such as KASAN, granular allocator and enable ALL debug logging. This stresses the serial a lot. This has been my go-to test for these kind of things.

enabling logging and debug features can directly affect the timings in the USBDEV implementation. The USB spec is very restrictive regarding timings, to the point that extending the execution time of some internal functions in the USB stack can easily break the specification and result in bad behavior.

raiden00pl avatar Jun 02 '25 10:06 raiden00pl

An easy way to test this problem is to enable CPU heavy operations, such as KASAN, granular allocator and enable ALL debug logging. This stresses the serial a lot. This has been my go-to test for these kind of things.

enabling logging and debug features can directly affect the timings in the USBDEV implementation. The USB spec is very restrictive regarding timings, to the point that extending the execution time of some internal functions in the USB stack can easily break the specification and result in bad behavior.

That is interesting and makes sense. That could indeed explain the behavior. Is there a way to keep this tight timing? Priority perhaps?

keever50 avatar Jun 02 '25 10:06 keever50

if the execution time of critical USB functions is an issue, then setting priorities won't help. USB stack functions should be executed as quickly as possible. If the problem is USB interrupt response time, then you can try to raise the USB interrupt priority, but then it may have negative consequences on the other parts of the system.

raiden00pl avatar Jun 02 '25 10:06 raiden00pl