RxAndroidBle icon indicating copy to clipboard operation
RxAndroidBle copied to clipboard

Expose transport and phy configuration for connection (Android 10 connection issue)

Open mtomczynski opened this issue 5 years ago • 13 comments

Hey,

Android 10 introduced problems with connecting to multiple BLE devices for me. Linked topics may be related: https://stackoverflow.com/questions/58299507/android-10-ble-connection-issue https://issuetracker.google.com/issues/141188862

I managed to find solution by forcing this config on connection:

bluetoothDevice.connectGatt(
                context,
                false, // auto connect set to false
                connectCallback,
                BluetoothDevice.TRANSPORT_AUTO,
                BluetoothDevice.PHY_LE_CODED
            )

But RxAndroidBle don't exposes possibility to set transport and phy. Have you considered exposing those settings?

mtomczynski avatar Nov 12 '19 11:11 mtomczynski

But RxAndroidBle don't exposes possibility to set transport and phy. Have you considered exposing those settings?

At some point — yes. Unfortunately I do have limited time and it is not top priority right now.

Could you shed a bit more light on your exact case? Have you tried to investigate this bug? Get HCI logs from your device?

dariuszseweryn avatar Nov 12 '19 18:11 dariuszseweryn

My case is a bit peculiar. I've got two devices on BT 4.2. They've got two modes, first is public undirected advertising for connection with an unknown central. Second is directed advertising with resolvable private address for reconnection with a bonded central.

After bonding to two devices and doing few successful connections/disconnections it bricks all future connections. After this phone can't connect and times out after 30s with Gatt error 133. It happens only when there are multiple bonded devices and only on Android 10, only thing that helps is clearing list of bonded devices.

Solution from first comment works only until I try to connect with different config, so let's say RxAndroidBle default. After which connection is bricked even for LE_CODED and Transport_auto config.

About the solution itself, setting phy to le_coded and transport to auto. My devices do not support BT5 so phone can't use 2M or le_coded for actual connection. In the HCI logs all connections are done with phy set to 1M no matter what was set in the API call (::connectGatt). But by some reason this is the only way I can connect to my devices.

From the bt device point of view, when the connection is bricked on particular phone. Bt device doesn't see any incoming connections when phone is trying to create one.

@dariuszseweryn a bit long description but case is complex. Anyway it looks like a bug deep in the bt layer that was introduced in Android 10

mtomczynski avatar Nov 13 '19 08:11 mtomczynski

More info the issue. Actual event of scanning is corrupting future connections to bonded devices.

After bonding I can connect freely to multiple devices, given that I turned off scanning after the bonding. Then after disconnecting from the devies and turning on scanning for a moment and turning it off I can no longer connect to my devices. It seems that scanning event corrupts some internal data for bonded devices.

mtomczynski avatar Nov 14 '19 12:11 mtomczynski

😲 Have you checked what is sent through HCI before and after scanning?

dariuszseweryn avatar Nov 14 '19 13:11 dariuszseweryn

I'm not an expert in low level BLE communication but I haven't seen anything out of ordinary in HCI logs. If you're interested in the topic I'd be more than happy to share the logs.

Also found out that it's not only about the scanning. What actually corrupts the connection is successfuly scanning devices from the bonded list. I can't reproduce the error If I'm turning off the devices during scanning events

mtomczynski avatar Nov 14 '19 13:11 mtomczynski

Please do. Having logs with a successful connection after bonding, re-scan and unsuccessful connection could be interesting to find out what may be happening there

dariuszseweryn avatar Nov 14 '19 14:11 dariuszseweryn

Here are logs and thank you! HCI logs are a bit crowded with events, please use logcat highlights to get timestamps for easier navigation. Here are steps from the issue:

  1. Started scan and discovered two devices
  2. Successfuly bonded to both devices
  3. Stopped scan
  4. Successfuly connected and disconnected few times
  5. Performed scan for few moments and turned it off
  6. Tried to connect to one of the devices which resulted in failure, gatt error 133 (30 sec timeout)

Just in case also attached full logcat logs with filtering on bluetooth stack.

logcat_full.txt BTSNOOP.log logcat_highlights.txt

mtomczynski avatar Nov 14 '19 14:11 mtomczynski

I've prepared similar log package from Android 9 where this problem doesn't occur. Differences I found after comparision:

  1. For some reason OnePlus with Android 9 doesn't seem to be using White List at all, where Android 10 Pixel always adds the device to the White List before the connection.

  2. Scanning gives slightly different results for both devices. OnePlus reports the device with address type of Public Identity Address (Corresponds to Resolved Private Address) (0x02) where on Pixel reported device type is Random Device Address (0x01). Address type is the only thing different, address stays the same.

Additionally before scanning that corrupts connection Pixel adds the device to White List with type Public Device Address (0x00) and then performs the connection to same address with same type so Public Device Address (0x00) which results in successful connection. But after scanning the type on add to white list event changes to Random Device Address (0x01) but on connection address type stays the same Public Device Address (0x00) after which connection fails. Address itself doesn't change, only type.

@dariuszseweryn Do you think it might be the actual problem that Android 10 doesn't correctly recognizes address type when scanning and then caches that address type in bond information? Or do you think it's just differences between manufacturers or system version in logging information?

bt_snoop_android_9.log logcat_full_android_9.txt logcat_highlights_android_9.txt

mtomczynski avatar Nov 15 '19 14:11 mtomczynski

~I assume your peripherals use public address types?~ I have updated Wireshark and now I see that yes. But I do not see any reported scans of the device between last successful connection and corrupted request. I do not yet see what could mess up the address type.

dariuszseweryn avatar Nov 15 '19 17:11 dariuszseweryn

You're right, the device won't always pop up in the scans, maybe it's scanned with true random address and is interpreted by something up in the stack. But still after the scanning it's added to white list with random type address.

Scanning itself is not full story, if I successfuly scan the devices, then turn off bt for a while and try to connect to them without doing second scan (BluetoothDevice from bonded list), connection is successful.

mtomczynski avatar Nov 18 '19 09:11 mtomczynski

Scanning itself is not full story, if I successfuly scan the devices, then turn off bt for a while and try to connect to them without doing second scan (BluetoothDevice from bonded list), connection is successful.

This is a well-known bug of Android. I have briefly mentioned about it on Wiki

You're right, the device won't always pop up in the scans, maybe it's scanned with true random address and is interpreted by something up in the stack. But still after the scanning it's added to white list with random type address.

I have been looking on frames between 4531 and 6017. First one is the last moment the peripheral is added to white list with public address type and the second is the first moment the peripheral is added to white list with random address type. I have tried searching for MAC address c5:c1:c4:74:61:88 but have not found it in between. That would suggest that the Android stack has some bug not directly related to this peripheral scan.

dariuszseweryn avatar Nov 18 '19 11:11 dariuszseweryn

Was doing some tests with single bonded device and managed to recreate the issue only on one device, which is new for my case.

Checked the logs after connecting without scanning and type address added to the whitelist is actually public as expected connection is successful in such case. When only one device is bonded scanning again doesn't affect future connections as device with correct type address is already on the white list and doesn't have to be re-added. It makes sense that it's really easy to duplicate the issue while using two devices because during connections they're added and removed from white list.

Here are logs with successful connecting without scanning. You can find adding to white list in frame 186 btsnoop_1811_4.log logcat_highlights.txt

What's interesting is that in logs with corrupted single device connection scanning returns correct address type public resolved from private but it's still added as random to the white list. Which results in unsuccessful connection. White list in frame 1198 btsnoop_1811_6.log logcat_highlights.txt

mtomczynski avatar Nov 18 '19 13:11 mtomczynski

@dariuszseweryn problem has been fixed from the bluetooth device perspective. I think it's really interesting case. Straight after bonding device didn't update it's address type immidiately to correct type (random static) but kept the incorrect (public) one. Address itself was correct.

These are white list operation with bug:

  1. Add to white list: type PUBLIC, address true private // First connection after bonding
  2. Add to white list: type RANDOM, address true private // All subsequent connections

These are white list operation with fixed address type:

  1. Add to white list: type RANDOM, address true private // First connection after bonding
  2. Add to white list: type PUBLIC, address current public random address // All subsequent connections

Here're logs with solution if you're interested: btsnoop_working.log logcat_highlights.txt

What is most interesting about it, is that this situation is tolerated by all iOS versions and Android version until 10. It appears that they dramatically changed how addresses are handled under the hood.

mtomczynski avatar Nov 19 '19 10:11 mtomczynski