[HELP] TCP/IP stuck on Send(), crash on any new connection NuttX 9.1 STM32H7
Description
Hello Community,
We have had very interesting issue that we have been struggling for over a year now as it is intermittent but being in Semiconductor industry, it is not acceptable to our customers.
We are currently running NuttX 9.1.8 with small customization for communication on STM32H7.
Symptom:
- No reply is received from STM32H7 based board, ACK is received for the packet
- Any connection like telnet, etc causes code to crash
- Putting debugger showed execution thread stuck on Send()
Things we tried:
- Added timeout on Send: This helped, we now get into retry mode but before this change we used to have intermittent timeouts; now we have many more timeouts; almost daily!
- Added retry on main controller: The sequence is one timeout, disconnect socket, wait 3.5secs, connect socket.. all successful. Then resend same message to the board; no response is received (if the board gets errno128 due client disconnecting socket, then no response is received by the client, but if error was 116 then we do get response)
- In the process of upgrading to 12.6 hoping the changes in stack helps resolve the issue
We also send unsolicited messages from the board which we believe every now on then do not get to the client. Please let us know if you need more info; I can share the code as needed.
Few other things: We use KSZ8051 on some boards and KSZ8895 PHY on other boards Original issue was reported on boards that are daisy chained over KSZ8895 but since we made change #1 we are seeing the issue on all boards.
Thank you in advance.
Verification
- [X] I have verified before submitting the report.
Hi,
The NuttX NuttX 9.1.8 is an old version and may bugfixes and improvements have been added since it was released. Please try to replicate the bug on the latest release and if possible update your NuttX version to the latest one
Best regards Alin
Hi @mraja-brooks probably the issue was fixed on version 12.x, the network improved a lot since version 9.x.
If the issue still happening on version 12.x, please create a simple application to demonstration the issue in some STM32H7 board supported into mainline.
Thanks you @jerpelea and @acassis .We are in the process of upgrading and will test it.
The only concern is that since the error occurs intermittently we will have to get the new version to our customer and wait for 6-9 months to really know if root cause is resolved.
That's not an ideal situation for our customers, they want to have some confidence that 12.6 will resolve the issue.
Please let us know if you think of anything.