ot-br-posix icon indicating copy to clipboard operation
ot-br-posix copied to clipboard

otbr-agent aborted with "radio tx timeout" error

Open pratikpanchal4 opened this issue 3 years ago • 17 comments

Hi,

We have system defined as below, system_diagram

Now, currently when we run otbr-agent on the host processor without any end device connected that works perfectly fine without any issue. But, when we connect our 4 end devices with the host device that after sometime otbr-agent aborts with the error "radio_tx_timeout' error.

Each end device transmits temperature reading at every 1 seconds I have attached the logs here for the reference minicom_logs_2.txt

One thing to notice here is when all the end node starts communicating then somehow we don't get transmit done message and it stops.

We have also increase receive buffer size to 30K so that should be enough.

Thanks, Pratik

pratikpanchal4 avatar Jan 12 '22 10:01 pratikpanchal4

Are you able to try this using one of your nRF52840's as an RCP in place of the CC2538? If so, can you also try using native USB mode?

jwhui avatar Jan 12 '22 21:01 jwhui

Yes, I have tried using nRF52840 as RCP with Raspberry Pi over native USB mode and that works fine. But in my custom hardware design I have CC2538 which is connected to host processor over UART which is causing the problem.

I also tried waptund and CC2538 as NCP which is also working fine.

So, there is some issue with CC2538 as RCP and otbr-agent.

Let me know if you need more information

Thanks, Pratik

pratikpanchal4 avatar Jan 13 '22 05:01 pratikpanchal4

We have found that increasing the baudrate can help. Are you able to increase the baudrate to 460800?

jwhui avatar Jan 13 '22 05:01 jwhui

Sure, Initially we have tried, but I can try again with increased baudrate and share you the logs.

pratikpanchal4 avatar Jan 13 '22 05:01 pratikpanchal4

I have tried increasing the baudrate to 460800. But still the same behavior. Attached is the log minicom_HighBoardRate_13Jan_1.txt

pratikpanchal4 avatar Jan 13 '22 09:01 pratikpanchal4

I have tried increasing the baudrate to 460800. But still the same behavior. Attached is the log minicom_HighBoardRate_13Jan_1.txt

In this particular case, it appears that the UART is dropping bytes. See:

otbr-agent[1641]: 00:08:03.590 [DEBG]-PLAT----: RCP => Framing error 6: [
otbr-agent[1641]: 00:08:03.590 [DEBG]-PLAT----: RCP => 84 03 71 63 00 49 98 41 34 12 FF FF 00 2C 0D 8B 98 1E 00 01 C0 64 43 4A 7C 5A 3F 5B 6F 5E 05 AE C4 E1 12 03 00 00 01 E1 06]

Error code 6 is OT_ERROR_PARSE, which is raised here: https://github.com/openthread/openthread/blob/bb658e270588fcdee9f5cfe3a77d5528c642cc22/src/lib/hdlc/hdlc.cpp#L248-L261

jwhui avatar Jan 13 '22 18:01 jwhui

Host interact RCP via UART using HDLC. I am not too familiar with HDLC, but I think HDLC should be able to tolerant occasional data corruptions ?

Thoughts?

simonlingoogle avatar Jan 19 '22 03:01 simonlingoogle

Host interact RCP via UART using HDLC. I am not too familiar with HDLC, but I think HDLC should be able to tolerant occasional data corruptions ?

HDLC is a framing protocol that can recover from corrupt frames. However, we do not implement a reliable transport protocol with retransmissions. If the dropped frame is something that the host expects, then communication will fail.

jwhui avatar Jan 19 '22 03:01 jwhui

Yes, but with 4-5 devices sending data at every 1 second, otbr-agent die quickly. I am not able to understand why is this happening? From the logs I have also observed that sometimes it receives 4K data constantly upto 28K, Which is beyond my understanding.

I have verified UART separately and it is working perfectly fine.

pratikpanchal4 avatar Jan 19 '22 05:01 pratikpanchal4

Yes, but with 4-5 devices sending data at every 1 second, otbr-agent die quickly. I am not able to understand why is this happening? From the logs I have also observed that sometimes it receives 4K data constantly upto 28K, Which is beyond my understanding.

I have verified UART separately and it is working perfectly fine.

Have you validated that the UART is 100% reliable? Spinel does not tolerate any dropped bytes.

jwhui avatar Jan 19 '22 05:01 jwhui

Yes, we have validated UART with 115200 baudrate and it is working fine.

pratikpanchal4 avatar Jan 19 '22 05:01 pratikpanchal4

Yes, we have validated UART with 115200 baudrate and it is working fine.

What about 460800, which is the log that showed framing errors?

jwhui avatar Jan 19 '22 05:01 jwhui

I haven't run my sample application with the 460800 baud rate. As this requires some hardware changes. I can try with that can update you. But, Are you really sure that the reason for it not working is the baud rate?

pratikpanchal4 avatar Jan 19 '22 05:01 pratikpanchal4

I haven't run my sample application with the 460800 baud rate. As this requires some hardware changes. I can try with that can update you. But, Are you really sure that the reason for it not working is the baud rate?

This log error:

otbr-agent[5821]: 00:01:19.795 [CRIT]-PLAT----: HandleRcpTimeout() at ../third_party/openthread/repo/src/lib/spinel/radio_spinel_impl.hpp:2247: RadioSpinelNoResponse

indicates that a spinel response frame is expected and never received. This could be due to unreliable UART, insufficient buffers to enqueue the message, the RCP is stuck and not able to service the spinel request, etc. If the nRF52840 with native USB is working well for you, I would guess that is something with the CC2538 or your setup.

Have you tried increasing OPENTHREAD_CONFIG_NCP_TX_BUFFER_SIZE to something larger?

jwhui avatar Jan 19 '22 06:01 jwhui

On RCP, OPENTHREAD_CONFIG_NCP_TX_BUFFER_SIZE is set to 8K. and RX buffer of otbr-agent is 30K.

pratikpanchal4 avatar Jan 19 '22 06:01 pratikpanchal4

I was able to stably reproduce this problem in openWRT system. otbr-posix version: 2022-1-5 a965a7535dad33303585d2080437f3d796e54a81 border router: ipq4018(host) + efr32mg21(rcp) CH340G usb2uart connect with hardware flow control child device: 20 nrf52840 usb dongle(as mtd) , send a COAP PUT packet of one byte to FF03 ::1 every one second posix rx buffer : 2048 phenomenon: Posix crashes after a few minutes of receiving child device data

log output:

Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.939 [NOTE]-CLI-----: Input: bufferinfo
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.939 [NOTE]-CLI-----: Output: total: 2048
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.939 [NOTE]-CLI-----: Output: free: 1740
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.940 [NOTE]-CLI-----: Output: 6lo send: 153 306
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.940 [NOTE]-CLI-----: Output: 6lo reas: 0 0
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.940 [NOTE]-CLI-----: Output: ip6: 0 0
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.940 [NOTE]-CLI-----: Output: mpl: 1 2
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.940 [NOTE]-CLI-----: Output: mle: 0 0
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.940 [NOTE]-CLI-----: Output: arp: 0 0
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.941 [NOTE]-CLI-----: Output: coap: 0 0
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.941 [NOTE]-CLI-----: Output: coap secure: 0 0
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.941 [NOTE]-CLI-----: Output: application coap: 0 0
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:46.941 [NOTE]-CLI-----: Output: Done
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.058 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:8ce4:dc46:c30:b57d PUT with payload: 03
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.119 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:6738:b2ee:25a5:182a PUT with payload: 0b
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.129 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:d838:f8b1:10a5:2c71 PUT with payload: 05
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.133 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:c709:59e0:2a1a:d2a8 PUT with payload: 01
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.184 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:b198:efaf:9d81:79c7 PUT with payload: 0e
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.195 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:159b:f725:6fde:dbe5 PUT with payload: 12
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.197 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:be79:8241:e1fb:729d PUT with payload: 09
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.240 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:6600:5182:e6dd:13f6 PUT with payload: 11
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.253 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:8b51:404f:9905:9dc1 PUT with payload: 06
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.272 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:3d1c:2369:589f:8ced PUT with payload: 0f
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.287 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:bdcd:a8ac:3e7a:12ec PUT with payload: 08
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.322 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:4bb6:ebe8:be5e:e701 PUT with payload: 0c
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.393 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:86c7:2b94:828a:791d PUT with payload: 0a
Mon Feb 15 22:53:12 2021 user.notice otbr-agent[6850]: 00:06:47.433 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:9bbe:901f:5823:8e37 PUT with payload: 07
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:47.577 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:50f1:d7ef:43c2:4dd3 PUT with payload: 14
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:47.721 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:aaf0:861e:1153:5821 PUT with payload: 02
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:47.776 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:76b6:3d39:8d2c:dbb9 PUT with payload: 10
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:47.800 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:cd16:189a:2358:ac3e PUT with payload: 04
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:47.816 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:e266:2f95:78a5:83fa PUT with payload: 13
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:47.887 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:6720:7303:3bbd:1147 PUT with payload: 0d
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:48.052 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:8ce4:dc46:c30:b57d PUT with payload: 03
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:48.120 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:6738:b2ee:25a5:182a PUT with payload: 0b
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:48.131 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:d838:f8b1:10a5:2c71 PUT with payload: 05
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:48.134 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:c709:59e0:2a1a:d2a8 PUT with payload: 01
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:48.187 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:159b:f725:6fde:dbe5 PUT with payload: 12
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:48.195 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:be79:8241:e1fb:729d PUT with payload: 09
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:48.202 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:b198:efaf:9d81:79c7 PUT with payload: 0e
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:48.240 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:6600:5182:e6dd:13f6 PUT with payload: 11
Mon Feb 15 22:53:13 2021 user.notice otbr-agent[6850]: 00:06:48.258 [NOTE]-CLI-----: Output: coap request from fdde:ad00:beef:0:8b51:404f:9905:9dc1 PUT with payload: 06
Mon Feb 15 22:53:13 2021 user.warn otbr-agent[6850]: 00:06:48.263 [WARN]-PLAT----: radio tx timeout
Mon Feb 15 22:53:13 2021 user.warn otbr-agent[6850]: 00:06:48.263 [WARN]-PLAT----: RCP failure detected
Mon Feb 15 22:53:13 2021 user.warn otbr-agent[6850]: 00:06:48.263 [WARN]-PLAT----: Trying to recover (1/5)
Mon Feb 15 22:53:13 2021 user.warn otbr-agent[6850]: 00:06:48.296 [WARN]-PLAT----: Error processing result: NotImplemented
Mon Feb 15 22:53:13 2021 user.warn otbr-agent[6850]: 00:06:48.296 [WARN]-PLAT----: Error waiting response: NotImplemented
Mon Feb 15 22:53:13 2021 user.crit otbr-agent[6850]: 00:06:48.296 [CRIT]-PLAT----: RestoreProperties() at ot-br-posix-2022-1-5/third_party/openthread/repo/src/lib/spinel/radio_spinel_impl.hpp:2370: Failure
Mon Feb 15 22:53:18 2021 user.info otbr-agent[7165]: [INFO]-UTILS---: Running thread-reference-20200818-1424-ge736d2488
Mon Feb 15 22:53:18 2021 user.info otbr-agent[7165]: [INFO]-UTILS---: Thread version: 1.2.0
Mon Feb 15 22:53:18 2021 user.info otbr-agent[7165]: [INFO]-UTILS---: Thread interface: wpan0
Mon Feb 15 22:53:18 2021 user.info otbr-agent[7165]: [INFO]-UTILS---: Backbone interface: wlan0
Mon Feb 15 22:53:18 2021 user.info otbr-agent[7165]: [INFO]-UTILS---: Radio URL: spinel+hdlc+uart:///dev/ttyUSB0?uart-baudrate=460800&uart-flow-control
Mon Feb 15 22:53:18 2021 user.debug otbr-agent[7165]: 49d.19:26:54.254 [DEBG]-PLAT----: Wait response: tid=0 key=0
Mon Feb 15 22:53:18 2021 user.info otbr-agent[7165]: 49d.19:26:54.258 [INFO]-PLAT----: RCP reset: RESET_POWER_ON
Mon Feb 15 22:53:18 2021 user.debug otbr-agent[7165]: 49d.19:26:54.258 [DEBG]-PLAT----: Wait response: tid=1 key=1
Mon Feb 15 22:53:18 2021 user.debug otbr-agent[7165]: 49d.19:26:54.262 [DEBG]-PLAT----: Wait response: tid=2 key=2
Mon Feb 15 22:53:18 2021 user.debug otbr-agent[7165]: 49d.19:26:54.266 [DEBG]-PLAT----: Wait response: tid=3 key=8
Mon Feb 15 22:53:18 2021 user.debug otbr-agent[7165]: 49d.19:26:54.269 [DEBG]-PLAT----: Wait response: tid=4 key=5

HeFeng1947 avatar Mar 22 '22 10:03 HeFeng1947

@jwhui Jonathan, could you please give me some debugging suggestions or improvement directions? I have tried various methods mentioned in the previous issue, such as improving baudrate to 460800, increasing hardware flow control and increasing rx buffer on the host side, but none of them had any effect.

HeFeng1947 avatar Mar 24 '22 03:03 HeFeng1947

Closing stale issue.

jwhui avatar Jan 26 '23 19:01 jwhui

Is that fixed @jwhui ?

chava33 avatar May 17 '23 19:05 chava33

Is that fixed @jwhui ?

@chava33 , this issue is often due to the RCP failing to respond. Have you tried reaching out to the radio vendor for support?

jwhui avatar May 17 '23 20:05 jwhui

yes, reported to vendor and waiting for the response. Will update you if i get any reply. Thanks @jwhui

chava33 avatar May 17 '23 20:05 chava33