esp-idf icon indicating copy to clipboard operation
esp-idf copied to clipboard

BLE pairing request ... or not (IDFGH-15400)

Open malachib opened this issue 6 months ago • 10 comments

Answers checklist.

  • [x] I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • [x] I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • [x] I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

This is a little bit of a cross-post, but I got no response on esp32.com.

Code on ESP32S3 running as a NimBLE peripheral has a custom service with a protected characteristic.

When I try to connect and then read from that characteristic from a remote central (Android device), it sits for a while then disconnects. No pairing things are ever presented, nor do I get any GAP or GATT access messages indicating something is up.

Here is the project: https://bitbucket.org/malachib/playground.esp/src/master/src/PGESP-67/poc1

I think I am missing something "simple" here. Initially I forgot the ble_store_config_init call and putting that in there fixed things, or so I thought. After a few days the issue re-emerged.

Environment

  • Tested with "BLE Scanner" app on Android Pixel 7
  • Compiled with ESP-IDF v5.4.1

Auxiliary Environment

Secondary uses cases

  • Tested also with "LightBlue" app on Android Pixel 7
  • Tested with bluetoothctl on two different Debian laptops

malachib avatar May 30 '25 17:05 malachib

Hi, I've observed some additional/refined behaviors.

It appears it happens in a particular use case:

  1. Boot ESP32, kicks into open ADV
  2. Initial pairing from Android succeeds. ENC characteristic can be read successfully.
  3. Disconnect, clear bonds in ESP32 via ble_store_clear
  4. Re-connect, Android pairing starts over and fails during encryption change gap event 10

Noteworthy is that if one restarts device after step 4, we start from step 1 and step 2 actually works again.

For a while it seemed like bluetoothctl elsewhere was working better, but not now. I am thinking that ble_store_clear isn't entirely the correct call here. Some evidence of this is:

D (839) NIMBLE_NVS: RPA_REC in RAM is filled up from NVS index = 3
D (839) NIMBLE_NVS: ble_store_config_rpa_recs restored 3 bonds

On startup despite ble_store_util_bonded_peers reporting 0 peers

malachib avatar Jun 01 '25 17:06 malachib

Hi @malachib, thanks for sharing the project directory. I tried cloning and running it on my side, but it's currently failing to compile. Could you please share a minimal build file or setup that you're using, so I can try to reproduce the issue more accurately on my end? That would be really helpful. Appreciate your support!

strange969 avatar Jun 09 '25 11:06 strange969

Most definitely. It is expected to "just work" but I will have another look at it. Could you paste some of the errors you experience?

Incidentally, the playground.esp repo has a lot of submodules in it, none of which are needed for this build.

EDIT: I repaired a glitch where fully open-advertising mode inhibited compilation. Please see if the latest update helps

malachib avatar Jun 09 '25 16:06 malachib

Hi @malachib, I’ve attached the screenshots for reference. After cloning the project, I tried running idf.py set-target esp32c2, but I’m encountering the following error:

Image

It seems to be related to the dependency lib mentioned in the idf_component.yml. Just wanted to check if I need to configure any access or update any settings on my side to resolve this.

strange969 avatar Jun 10 '25 07:06 strange969

Likely this is due to lib/idf_component.yml specifying [email protected]:malachi-iot/embr.git

I probably ought to change that to https://github.com/malachi-iot/embr.git since it's open source, but I remember hitting an issue with https back in the day. Still tomorrow I'll give that a shot

Alternatively, if you have your bash SSH credentials set up for github (which I strongly recommend), then I would expect everything to work OK.

malachib avatar Jun 10 '25 08:06 malachib

lib/idf_component.yml now utilizes https as mentioned above. Please pull latest and give it another try

malachib avatar Jun 10 '25 20:06 malachib

Thanks a lot, I’ll pull the latest changes and try building it on my side.

strange969 avatar Jun 11 '25 05:06 strange969

Hi @malachib ,

I built the code and it’s working well. Here's the behavior I observed:

-> On the first connection, after the link is established, pairing occurs followed by an encryption change event. I'm attaching the screenshot below for reference.

Image

-> After disconnecting, when I reconnect to the PGESP-67 device, the encryption change event is triggered again.

Image

This seems to be the expected behavior — pairing followed by encryption on the first connection, and encryption on subsequent reconnections. I'm using the ESP32-S3 chip and testing with the nRF Connect app.

Do you not see this behavior on your side? Kindly try it at your end once and let me know your observations.

strange969 avatar Jun 11 '25 06:06 strange969

That does seem to deviate from what I encountered. Gathering screen shots and will share shortly.

EDIT

I've hit a rabbit hole condition where bonds are now never retained. Sorting that out.

Also, the misbehavior must be stoked by issuing the console command ble bond clear to wipe out existing bonds, then reset the part. I don't see any evidence you did that step (step 3)

EDIT2

My "reset the part" assertion is the opposite of what should be done. As with step 4 indicated in my 01JUN25 post, attempt reconnect immediately after clearing bonds.

Screenshots forthcoming shortly.

malachib avatar Jun 11 '25 18:06 malachib

Image

Image

Noteworthy is today the first time I tried to ble bond clear and reconnect it did actually work (not pictured). I subsequently tried again and Android asked to forget pairing first, then pair again. Subsequent pairing attempts appear as you see in the second picture.

malachib avatar Jun 11 '25 22:06 malachib

Guys, it's been two weeks... any word my friends?

malachib avatar Jun 26 '25 02:06 malachib

Hi @malachib,

As mentioned in this earlier comment, the pairing was working correctly, and on reconnection, the encryption change event was triggered as expected.

I also tested the BLE bond clear scenario —after removing the bonding information using the nRF Connect app, the client initiated repeat pairing, and pairing completed successfully, followed by the encryption change event.

Based on your last update, it seems to be working fine on your end as well.

Please let me know if there’s still anything pending or not working as expected.

strange969 avatar Jun 26 '25 05:06 strange969

There's been a miscommunication then. My last update indicates undesired behavior, and it's definitely not working for me over here. I'll need to remind myself of the details with all this time that has passed, will get you those details in the next 24h

malachib avatar Jun 26 '25 05:06 malachib

Using idf.py monitor console, do the following:

  1. Fire up device and connect your BLE test tool
  2. Read from object_name characteristic
  3. Pairing operates as expected
  4. Disconnect
  5. From REPL console, type ble bond clear
  6. Reconnect and read object_name characteristic. Android device jumps through hoops to forget previous pairing

Scenario 1: With ESP-IDF v5.4.1 this silently fails and disconnects at step 6 Scenario 2: With ESP-IDF v5.4.2 at step 6 we see

I (67366) NimBLE: ogf=0x08, ocf=0x0027, hci_err=0x212 : BLE_ERR_INV_HCI_CMD_PARMS (Invalid HCI Command Parameters)

And operation appears to otherwise succeed.

Both Scenario 1 and Scenario 2 are problematic. Can we get some guidance what this NimBLE error is telling us? No other noteworthy log messages appear

malachib avatar Jun 26 '25 22:06 malachib

Hi guys, a week has passed. Any thoughts?

malachib avatar Jul 04 '25 03:07 malachib

Hi @malachib,

Apologies for the delayed response. I was caught up with a critical issue and couldn’t get to this earlier. I’ll be spending some time on it and will try to reproduce the issue. I’ll share an update with you as soon as possible.

strange969 avatar Jul 04 '25 05:07 strange969

Hi @malachib,

Apologies for the delayed response. I was caught up with a critical issue and couldn’t get to this earlier. I’ll be spending some time on it and will try to reproduce the issue. I’ll share an update with you as soon as possible.

Awesome, thank you. I appreciate that you're giving the issue attention, just wanted to make sure it hadn't drifted off into complete limbo !

malachib avatar Jul 04 '25 05:07 malachib

Hi @malachib,

Regarding the disconnect behaviour at tag v5.4.1, this issue has already been addressed in the latest tag. However, if you prefer to remain on v5.4.1, you can apply the patch to $IDF_PATH/components/bt/host/nimble/nimble.

add_type_change_tag_v5_4_1.txt

Now, about the message observed at step 6 in v5.4.2:

I (67366) NimBLE: ogf=0x08, ocf=0x0027, hci_err=0x212 : BLE_ERR_INV_HCI_CMD_PARMS (Invalid HCI Command Parameters)

This occurs when the HCI_LE_Add_Device_To_Resolving_List command is sent during re-pairing. This command is used to add a device to the resolving list in the controller, which helps manage Resolvable Private Addresses (RPAs).

According to the specification:

If a device with the same Peer_Identity_Address_Type, Peer_Identity_Address, Peer_IRK, and Local_IRK already exists in the resolving list, the controller may either:

Reject the command (in which case it returns 0x12 – Invalid HCI Command Parameters), or Silently ignore the request and return success.

In this case, the controller is choosing to reject the duplicate entry, which is expected behaviour per the spec. This is why you are seeing this on console.

Also, please note that this is not an error. We recently added debug logs that prints these HCI return code.

Image

strange969 avatar Jul 07 '25 12:07 strange969

Thank you very much. This is completely logical and makes sense to me. At first I thought, hmm is it really an error if a pairing request happens again? Then it clicked. Initiator of pairing really ought to know whether to try pairing over again.

After decades of development, I resist leaping to a conclusion that the vendor API has a bug. Either way you slice it I am relieved.

In a way I'm fortunate it didn't kick back the duplicate entry harder than it does. "Just works" for real.

I'd like to do more testing, but am optimistic. Do we close this issue and can re-open if I find further problems?

malachib avatar Jul 09 '25 00:07 malachib

Thank you for the update. If the issue is now resolved, please feel free to close this ticket. If you run into any other issues in the future, feel free to raise a new ticket— we’ll be happy to assist.

strange969 avatar Jul 09 '25 06:07 strange969

Alright I'll do some more testing and within the next two days either close this out or complain some more one or the other :)

malachib avatar Jul 09 '25 13:07 malachib

Closing issue. Feel free to reopen in case of any further updates.

rahult-github avatar Jul 29 '25 05:07 rahult-github