esp-nimble-cpp
esp-nimble-cpp copied to clipboard
NimBLE hang just after connection Android 5 (BLE 4.0)
making a new issue... so.. im trying this IDF version is master example code https://github.com/espressif/esp-idf/tree/master/examples/bluetooth/nimble/bleprph left all nimble settings as default except added Enable extra runtime asserts and host debugging
with debug at INFO, i can get many successful connects, it does hang every once in a while but i do get many successes, way more than with the C++ wrapper(which i dont think i got any with just LOGLEVEL INFO)
We don’t really do anything different in the cpp wrapper, as far as settings and initialization go. So from that perspective there should be no difference. However there could be timing issues that happen more often with the cpp library than without, the fact you are seeing the issue without it seems to point in that direction.
Could you post your sdkconfig?
GOT NEW Samsung tablet running Android 9 (BLE 5.0) works perfectly every connect!!!
Great, I've been looking for an old android on BLE 4.0 for cheap around here to test your issue. Not much luck, I tried an old iphone, but I'm not sure what bluetooth version it is, still works perfectly.
i can send you the one i got.... its just a tab A 7, they also on ebay (Samsung SM-T280) eek, you being in canada makes it a little tricky i think to send stuff
android has a history of poor implantation of BLE there are complaints, if you google... if it diditn work at all thats one thing, but when it sometimes works... makes it all odd we don't have to chase it too much, i would just have to have the app work on newer than android 5 which is sad as the tablet is only 4 years old... and my 4 year old ipad air works perfectly
(oh and bluedroid never had this issue)
i can send you the one i got.... its just a tab A 7, they also on ebay (Samsung SM-T280) eek, you being in canada makes it a little tricky i think to send stuff
Thanks! I would take you up on that if it wasn't for the shipping thing, would cost too much. I'll keep an eye out for a local source.
we don't have to chase it too much, i would just have to have the app work on newer than android 5 which is sad as the tablet is only 4 years old... and my 4 year old ipad air works perfectly
Bugs need to be squashed, there is something somewhere that should not happen, although minor in terms of user base now, it could become worse later. I have a couple suspicions but no way to test them, one thing you could do though if you don't mind, is test your issue against the IDF-master branch, just to rule out the controller as the problem.
i did do most of the testing on IDF-Master, same same... either sometimes the android BLE doesnt send a response to the question but i would think the nimble code has a timeout on a HCI command that does the request
Ok, at least it's probably not a controller issue.
Nimble does have a timeout for this (2 seconds). It starts here and waits for the ack here using a semaphore take with 2 second timer. I think this is working fine up to this point, what happens after this might be where it's hanging up.
does code stay in rc = ble_hs_hci_wait_for_ack() till the ack happens or timeout?
It waits there until ack is received or 2 seconds, whichever comes first.
I think I found something at this line that might cause a hangup, not sure as to why it would happen but worth a test.
If you change that line to ret = xQueueSendToBack(evq->q, &ev, 2000/portTICK_PERIOD_MS);
I'd be curious if you trigger the assert.
i can, remember its not a total hang, eventually there is a timeout (could be like 30 seconds) and the client disconnects (could be the server disconnections, not sure how to know) no effect changing to 2sec timeout on that one... it may not actually be hung, its just expecting the response from the feature request, and it never comes in
i added some logging in ble_hs_hci_cmd_tx()
ble_hs_hci_cmd_tx()): opcode=0x2016 ; this is the opcode that is send out (LE Read Remote Used Features | 8 | 22 | 0x2016) ble_hs_hci_wait_for_ack()): rc=0; is just after rc = ble_hs_hci_wait_for_ack();
it gets out of that function... and not sure where its returning to at this point... my guess in ble_gap.c ble_gap_rd_rem_sup_feat_tx()
it doesnt get into ble_hs_hci_rx_evt() after sending out that opcode...
This sounds very familiar now, I’ve seen this kind of behavior before (the reason I knew to comment the line in connect out previously). It seemed to affect more recent versions of IDF and Arduino. I believe one of my PR’s upstream fixed it for the client side code but perhaps something related is still affecting the server.
I’ll have to switch things around to see if I can repro this again but I have a hunch that the older IDF versions (3.3) do not have this issue.
do you have a debugger? or you debug like i do with log strings :)
i wonder after it gets back a response from the remote used features... what WOULD be the next thing it does in the connection process (i guess i have to watch a working example)
muther bugger! all i did now is set log level to DEBUG, and it worked you can see:
ble_hs_hci_cmd_tx()): opcode=2016 Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8 ble_hs_hci_rx_evt()):hci_ev[0]=0x3e LE Remote Used Features. FAIL (status=26) ble_hs_hci_evt_acl_process(): conn_handle=0 pb=2 len=11 data=0x07 0x00 0x04 0x00 0x10 0x01 0x00 0xff 0xff 0x00 0x28
it does fail few times, but once log is DEBUG it actually works a few times!
if you remember, if i comment out the request for remote features used, it always works ble_gap_rd_rem_sup_feat_tx call in ble_gap.c if it didnt work at all with debug level, then i could say for sure its the android tablet
FAIL:
D NimBLEServerCallbacks: "onConnect(): Default" ble_hs_hci_cmd_send: ogf=0x08 ocf=0x0016 len=2 0x16 0x20 0x02 0x00 0x00 ble_hs_hci_rx_evt()):hci_ev[0]=0x0f - hci_ev[2]=0x00 ble_hs_hci_cmd_tx()): opcode=2016 Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8
SUCCESS:
D NimBLEServerCallbacks: "onConnect(): Default" ble_hs_hci_cmd_send: ogf=0x08 ocf=0x0016 len=2 0x16 0x20 0x02 0x00 0x00 ble_hs_hci_rx_evt()):hci_ev[0]=0x0f - hci_ev[2]=0x00 ble_hs_hci_cmd_tx()): opcode=2016 Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8 ble_hs_hci_rx_evt()):hci_ev[0]=0x3e - hci_ev[2]=0x04 LE Remote Used Features. FAIL (status=26) ble_hs_hci_evt_acl_process(): conn_handle=0 pb=2 len=11 data=0x07 0x00 0x04 0x00 0x10 0x01 0x00 0xff 0xff 0x00 0x28 ...
do you have a debugger? or you debug like i do with log strings :)
I have a black magic probe that I use for debugging other stuff but sadly doesn't work with the esp32 (needs a special GDB) so printing strings and hoping they print where the error occurs is all I have atm.
i wonder after it gets back a response from the remote used features... what WOULD be the next thing it does in the connection process (i guess i have to watch a working example)
It's a state machine, just handling events as they happen and updating the states, in this case after the events happen it goes back to the controller... sadly a closed binary black box on the esp32.
muther bugger! all i did now is set log level to DEBUG, and it worked you can see:
ble_hs_hci_cmd_tx()): opcode=2016 Command Status: status=0 cmd_pkts=5 ocf=0x16 ogf=0x8 ble_hs_hci_rx_evt()):hci_ev[0]=0x3e LE Remote Used Features. FAIL (status=26)
Lol, it seems like some sort of timing or buffering the in the controller (nobody know whats going on in there) that is either waiting for something or overwrites a buffer somewhere and when debug is enabled it changes the timing enough to avoid it.
I found the issue, it’s the controller.
See my post upstream espressif/esp-nimble#13
@mitchjs are you using git to update your IDF version? I'm wondering if you updated the submodules?
git submodule update --init --recursive
there was a controller update that I believe solves this problem here
@h2zero , wouldnt that be in the fresh clone of master? which is v4.2-dev-1905-g625bd5eb1 i ran $ git submodule update --init --recursive Submodule path 'components/bt/host/nimble/nimble': checked out 'fead24e5d5c0f4bd9b8c3d71cf72a87d75631399' seems there where a few updates to nimble just the other day..
im pretty sure i have the latest... still hangs
If you did a fresh clone, yes. Just thought I'd ask. The problem does not happen on my old iPhone with the latest master but it certainly did with previous versions on the controller. Perhaps when they fixed it, they didn't fix it completely?
The latest idf-master is a couple commits behind the controller, maybe try the latest controller commit and see if anything changes?
@mitchjs
I have recently discovered an issue that happens in Arduino and could be related to this, the fix is this commit
If you're still facing this issue give that patch a try and see if it helps.
hmm, i kinda been following this... that file is of course ESP-IDF, i could patch it, and see is this something need to take up with Espressif? when running NimBLE do you know what core all the code runs on? (can it vary?)
also i never noticed any "delays"
you're thinking it could fix the wierd android 5 issue.... hmm
one thing i havent been working on any code of late, due to i think its all done.... and it was all working 👍 i did just update to the offical ESP-IDF v4.1 release everything compiled... but i didnt really test... hopefully it has the CCCD with bonded devices nimble patch in there i so confused at times with all the different versions
Lots of trial and error with different versions for sure haha.
What I found happening is if a task is running on core 1 and makes a call to NimBLE the command gets sent to the controller task on core 0 and occasionally there was strange behavior.
That might not apply to you but it could depending on config etc.. so I thought I'd mention it. If it does have some effect then yes it would need to be brought to espressif's attention, however I'm guessing this bug has been fixed in master by now.
Glad to hear your project is compete and working well! I'm converting my project to mesh once I get the library sorted.