mynewt-mcumgr
mynewt-mcumgr copied to clipboard
SMP notifications corrupted on apollo3
SMP notifications used to deliver SMP responses over BLE transport are corrupted on the apollo3_evb.
Below is a sample log from nRF Connect app attempting an image list
command (as part of the DFU process). You can see the response being fragmented over two notifications, with the second notification corrupted. As a result, the concatenated CBOR is not decodable. The response does not need to be corrupted - the issue exists even for single-packet responses.
This is reproducible with stock bleprph on apollo3_evb. This is NOT reproducible with the same app on nordic_pca10056.
Error can be observed with the nRF Connect app or the nRF Device Manager app for Android/iOS (CBOR Error).
V 17:20:50.573 Connecting to 66:77:88:23:BB:EF...
D 17:20:50.573 gatt = device.connectGatt(autoConnect = false, TRANSPORT_LE, preferred PHY = LE 1M)
D 17:20:56.145 [Callback] Connection state changed with status: 133 and new state: DISCONNECTED (0)
E 17:20:56.150 Error 133 (0x85): GATT ERROR
I 17:20:56.150 Disconnected
D 17:20:58.707 gatt.close()
D 17:20:58.709 wait(200)
V 17:20:58.910 Connecting to 66:77:88:23:BB:EF...
D 17:20:58.910 gatt = device.connectGatt(autoConnect = false, TRANSPORT_LE, preferred PHY = LE 1M)
D 17:21:02.715 [Callback] Connection state changed with status: 0 and new state: CONNECTED (2)
I 17:21:02.715 Connected to 66:77:88:23:BB:EF
V 17:21:02.715 Requesting new MTU...
D 17:21:02.715 gatt.requestMtu(517)
I 17:21:03.159 Connection parameters updated (interval: 7.5ms, latency: 0, timeout: 5000ms)
I 17:21:03.391 MTU changed to: 256
V 17:21:03.398 Discovering services...
D 17:21:03.398 gatt.discoverServices()
D 17:21:03.404 [Callback] Services discovered with status: 0
I 17:21:03.404 Services discovered
V 17:21:03.406 Generic Access (0x1800)
- Device Name [R] (0x2A00)
- Appearance [R] (0x2A01)
Device Information (0x180A)
- Model Number String [R] (0x2A24)
- Serial Number String [R] (0x2A25)
- Firmware Revision String [R] (0x2A26)
- Manufacturer Name String [R] (0x2A29)
SMP Service (8d53dc1d-1db7-4cd3-868b-8a527460aa84)
- SMP Characteristic [N WNR] (da2e7828-fbce-4e01-ae9e-261174997c48)
Client Characteristic Configuration (0x2902)
I 17:21:03.474 Connection parameters updated (interval: 45.0ms, latency: 0, timeout: 5000ms)
V 17:21:08.902 [McuMgr] Connecting...
D 17:21:08.905 [McuMgr] gatt = device.connectGatt(autoConnect = false, TRANSPORT_LE, LE 1M)
D 17:21:08.953 [McuMgr] [Callback] Connection state changed with status: 0 and new state: 2 (CONNECTED)
I 17:21:08.958 [McuMgr] Connected to 66:77:88:23:BB:EF
D 17:21:08.962 [McuMgr] wait(300)
I 17:21:08.966 [McuMgr] MTU changed to: 256
V 17:21:09.266 [McuMgr] Discovering services...
D 17:21:09.271 [McuMgr] gatt.discoverServices()
I 17:21:09.276 [McuMgr] Services discovered
V 17:21:09.286 [McuMgr] Primary service found
V 17:21:09.290 [McuMgr] Requesting new MTU...
D 17:21:09.292 [McuMgr] gatt.requestMtu(498)
I 17:21:09.370 [McuMgr] MTU changed to: 256
D 17:21:09.374 [McuMgr] gatt.setCharacteristicNotification(da2e7828-fbce-4e01-ae9e-261174997c48, true)
V 17:21:09.378 [McuMgr] Enabling notifications for da2e7828-fbce-4e01-ae9e-261174997c48
D 17:21:09.381 [McuMgr] gatt.writeDescriptor(00002902-0000-1000-8000-00805f9b34fb, value=0x01-00)
I 17:21:09.459 [McuMgr] Data written to descr. 00002902-0000-1000-8000-00805f9b34fb, value: (0x) 01-00
I 17:21:09.464 [McuMgr] Notifications enabled
A 17:21:09.470 [McuMgr] Sending (10 bytes) Header (Op: 0, Flags: 0, Len: 2, Group: 1, Seq: 0, Command: 0) CBOR {}
V 17:21:09.480 [McuMgr] Writing characteristic da2e7828-fbce-4e01-ae9e-261174997c48 (WRITE COMMAND)
D 17:21:09.484 [McuMgr] gatt.writeCharacteristic(da2e7828-fbce-4e01-ae9e-261174997c48)
I 17:21:09.489 [McuMgr] Data written to da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 00-00-00-02-00-01-00-00-BF-FF
I 17:21:09.648 [McuMgr] Notification received from da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 01-00-00-FC-00-01-00-00-BF-66-69-6D-61-67-65-73-9F-BF-64-73-6C-6F-74-00-67-76-65-72-73-69-6F-6E-69-31-2E-30-2E-30-2E-34-37-39-64-68-61-73-68-58-20-B2-27-75-77-F3-C1-C3-BF-E5-AA-26-64-7C-DF-77-AD-26-27-C8-EC-58-E3-C5-DC-8D-3B-97-B6-51-9E-73-BC-68-62-6F-6F-74-61-62-6C-65-F5-67-70-65-6E-64-69-6E-67-F4-69-63-6F-6E-66-69-72-6D-65-64-F5-66-61-63-74-69-76-65-F5-69-70-65-72-6D-61-6E-65-6E-74-F4-FF-BF-64-73-6C-6F-74-01-67-76-65-72-73-69-6F-6E-69-31-2E-30-2E-30-2E-34-37-39-64-68-61-73-68-58-20-9F-46-47-02-D7-E9-BA-FD-57-69-BF-73-42-88-86-85-A0-1B-66-C1-B7-84-48-0A-E3-E8-F8-B5-12-95-80-F7-68-62-6F-6F-74-61-62-6C-65-F5-67-70-65-6E-64-69-6E-67-F4-69-63-6F-6E-66-69-72-6D-65-64-F4-66-61-63-74-69-76-65-F4-69-70-65-72-6D-61-6E-65-6E-74-F4-FF-FF-6B-73-70-6C-69-74-53
I 17:21:10.226 [McuMgr] Notification received from da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 74-53-00-FC-00-01-00
V 17:21:10.246 [McuMgr] Disconnecting...
D 17:21:10.253 [McuMgr] gatt.disconnect()
D 17:21:10.289 [McuMgr] [Callback] Connection state changed with status: 0 and new state: 0 (DISCONNECTED)
I 17:21:10.294 [McuMgr] Disconnected
D 17:21:10.299 [McuMgr] gatt.close()
The expected second notification (at t=17:21:10.226) should be:
74-61-74-75-73-00-FF
cc @t3zeng
Hey guys, you can reproduce this by running the bleprph app and then try to perform an image read.
Repro steps:
- Load bleprph app onto apollo3 evb and boot up
- On your phone, go to nRF Connect Device Manager and look for the evb
- Go to the
Image
tab at the bottom and then go to the top right and pressAdvanced
- A new
Images
section will pop up. Press read - At this point there will be BLE comms with the apollo3 evb but the image data will fail to be retrieved
What I observe is that in /apache-mynewt-core/mgmt/smp/transport/ble/src/smp_ble.c
, if you print out the contents of om
in smp_ble_out
that gets put onto the mqueue, it will not match the contents when you get the mbuff back in smp_ble_event_data_in
even though the address of the mbuf is the same.
I set a watchpoint on the address of the corrupted mbuf and was able to get the following backtrace:
#0 smp_tx_rsp (ns=<optimized out>, rsp=0x100025a4 <os_msys_1_data+112>, arg=0x10002f80 <g_smp_ble_transport>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:226
#1 0x0001c418 in smp_process_request_packet (streamer=streamer@entry=0x10002f80 <g_smp_ble_transport>, req=0x10004408 <pool_acl_buf>)
at repos/apache-mynewt-mcumgr/smp/src/smp.c:365
#2 0x00016e50 in smp_process_packet (st=0x10002f80 <g_smp_ble_transport>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:265
#3 0x00016e78 in smp_event_data_in (ev=<optimized out>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:293
#4 0x00015526 in os_eventq_run (evq=<optimized out>) at repos/apache-mynewt-core/kernel/os/src/os_eventq.c:196
#5 0x00014aea in main () at apps/bleprph/src/main.c:356
What seems to happen is that in this line of code the mbuf returned is the same address as the one used to store the notify data so the mbuf gets corrupted. This behavior does not seem to be present on the nrf52840dk pca10056
Adding to the above, I set up some checks and increased the size of the task stack to 8192 words OS_CTX_SW_STACK_CHECK: 1 OS_MEMPOOL_CHECK: 1 OS_MEMPOOL_GUARD: 1 OS_MEMPOOL_POISON: 1 The issue persists despite the stack usage is very far off from maxing out.