RxAndroidBle icon indicating copy to clipboard operation
RxAndroidBle copied to clipboard

API feature to close/release all connections or clean the cached devices

Open kvmw opened this issue 4 years ago • 22 comments

Is your feature request related to a problem? Please describe. I'm using the library to scan and read some characteristics from other phones while everything is working perfect with relatively new phones with android 8 or higher, old phones with android 7 are failing to close the connections and after couple of hours they are ending up in status 133:

com.polidea.rxandroidble2.exceptions.BleDisconnectedException: Disconnected from MAC='XX:XX:XX:XX:XX:XX' with status 133 (GATT_ERROR)

and they are not able to scan until the device is rebooted.

This might be a bug in my code or the library, but since new phones are working flawlessly. i suspect it is related to android os.

Describe the solution you'd like Would it be possible to kill the connections and release them using an api call in the library? Or If this is caching related issue, to clean cached devices, via this library ?

kvmw avatar May 12 '20 22:05 kvmw

This is OS related problem. There is nothing can be done afaik. Their BLE stack is probably getting off and there is no programmatic way that could ensure proper functionality. You could try turning off/on the Bluetooth Adapter and Wifi (on <=6.0)

dariuszseweryn avatar May 13 '20 08:05 dariuszseweryn

@dariuszseweryn, unfortunately turning off/on doesn't help. i should reboot the device or clear the bluetooth cache and data. I was hopping for a solution without user interaction.

is this a known issue ? has there been any investigation from your side to find the cause of this issue?

kvmw avatar May 13 '20 13:05 kvmw

I have not encountered the issue you are describing. I successfully tested connections that spanned ~3 days (over a weekend).

There are known issues with Android BLE stack that has a tendency to put a lot of problems under "status 133". I have no idea on what may be the problem in your case.

There is only BluetoothGatt cache that can be cleared programatically but as far as I know it only clears cache of discovered services and I doubt that it could help in your case.

At least you have not described what you are doing with the connection anyway.

Feel free to provide more info, perhaps make investigation what may be going on in your case — if that will give some generic conclusions I will be more than happy to incorporate this into the library.

dariuszseweryn avatar May 13 '20 14:05 dariuszseweryn

@dariuszseweryn thanks for the reply. Here is my scenario :

  • multiple phones that are advertising and scanning in the same time.
  • advertisers have only one characteristic that can be read by the scanners.

so for scanners, here is what i'm doing, at least in theory:

  • i have a foreground android service that does the following
  • observe the client state until it is READY
  • scans devices with specific service-id
  • for each device in scan result: - connects, reads characteristic and disconnect

here is my sample code to give more context:

rxBle.observeStateChanges()
   .startWith(rxBle.state)
   .switchMap { state ->
            when (state) {
                READY -> rxBle.scanBleDevices(settings, filter)
                else -> Observable.empty()
            }
        }
            .filter { result -> isDisconnected(result) }
            .subscribe(
                { result ->
                    val compositeDisposable = CompositeDisposable()
                    result
                        .bleDevice
                        .establishConnection(false)
                        .flatMapSingle { connection ->
                            Single.zip(
                                connection.readCharacteristic(UUID.fromString("sample-char-uuid")),
                                connection.readRssi(),
                                BiFunction { value, rssi -> Pair(value, rssi) }
                            )
                        }
                        .doFinally {
                            compositeDisposable.clear()
                        }
                        .take(1)
                        .subscribe(
                            { pair -> save(pair) },
                            { error -> log(error) }
                        )
                        .let {
                            compositeDisposable.add(it)
                        }
                },
                { error -> log(error) }
            )
            .let {
                scanDisposable = it
            }

kvmw avatar May 13 '20 17:05 kvmw

old phones with android 7 are failing to close the connections

I haven't noticed this before — how can you tell that they are failing to close connections?

dariuszseweryn avatar May 14 '20 19:05 dariuszseweryn

old phones with android 7 are failing to close the connections

I haven't noticed this before — how can you tell that they are failing to close connections?

i'm not 100% sure but reading and searching about status 133 always links to connection leaks and failing to close the connections properly ( the famous close and disconnect conversation).

also, even though in my code i'm filtering for disconnected devices but i see BleAlreadyConnectedException in stack trace too.

kvmw avatar May 15 '20 06:05 kvmw

The library is always calling BluetoothGatt.close() when ending the connection

BleAlreadyConnectedException Javadoc:

 /**
 * An exception being emitted from an {@link io.reactivex.Observable} returned by the function
 * {@link com.polidea.rxandroidble2.RxBleDevice#establishConnection(boolean)} or other establishConnection() overloads when this kind
 * of observable was already subscribed and {@link com.polidea.rxandroidble2.RxBleConnection} is currently being established or active.
 *
 * <p>
 *     To prevent this exception from being emitted one must either:<br>
 *     * always unsubscribe from the above mentioned Observable before subscribing again<br>
 *     * {@link io.reactivex.Observable#share()} or {@link io.reactivex.Observable#publish()} the above mentioned
 *     Observable so it will be subscribed only once
 * </p>
 */

It just means that you are already trying to connect to a given peripheral probably due to race conditions.

dariuszseweryn avatar May 15 '20 09:05 dariuszseweryn

old phones with android 7 are failing to close the connections

I haven't noticed this before — how can you tell that they are failing to close connections?

The main observed behaviour is that these devices eventually, usually after 30 minutes or so, become unable to establish new connections.

We are inferring that it's related to the number of connections. Will try and keep a counter and see if the failures start around a specific number.

lassebe avatar May 15 '20 16:05 lassebe

Android has its limits of how many connections it can handle at any given time. Search for BluetoothGatt limitations on different API levels. It was ~4 at API 18, 7 on API 21, 15 on API 23 per whole OS. You have to close connections once they are not needed.

dariuszseweryn avatar May 16 '20 13:05 dariuszseweryn

Right, that's why we try to connect to one device at a time, and dispose of the connectionDisposable as soon as we either get a success or an error.

This doesn't seem to be an issue with too many concurrent connections, but rather that we are hitting some limitation in how many connections we can open (and close) before something goes terribly wrong. I don't think it's tied to the library really, it feels more likely that it's an OS level issue. So far, I don't think we've seen it on Android 9+, just 7 and in one case 8.1.

We're really just trying to figure out if there's anything else we can do to:

  1. Reliably detect that the device has entered this state
  2. Mitigate it if possible

lassebe avatar May 16 '20 16:05 lassebe

@dariuszseweryn I've set a simpler scenario which still reproduces the issue in about 3 hours:

in 1 minutes intervals

  • start scan: for 10 seconds
  • pick one device from the result (if there is any)
  • stop the scan (dispose)
  • connect to the device and read a single characteristics
  • close the connection (dispose)

since the default connection timeout is 30 secs. the 1 min window is more than enough to scan for 10 secs, find a single device, connect and read a single characteristics (or timeout).

I am using a single device for advertising and another one for scanning. so, there is no race condition and there is no connection leak in the code, as far as i can tell.

i'm almost convinced this is an issue with android BLE stack.

kvmw avatar May 21 '20 12:05 kvmw

Ideally between scanning and connecting to the device there should be ~0.5 second delay. This was observed to help. There is also a known issue (race condition) on Android in which the timeout happens around the same time as the peripheral gets connected – it then appear to the application and the OS that the peripheral is not connected, yet the peripheral has connection to the mobile. I cannot find reference right now. This cannot be mitigated in the code unfortunately. The best way to avoid it is to have a peripheral that advertises itself often enough.

dariuszseweryn avatar May 21 '20 13:05 dariuszseweryn

Ideally between scanning and connecting to the device there should be ~0.5 second delay. This was observed to help. There is also a known issue (race condition) on Android in which the timeout happens around the same time as the peripheral gets connected – it then appear to the application and the OS that the peripheral is not connected, yet the peripheral has connection to the mobile. I cannot find reference right now. This cannot be mitigated in the code unfortunately. The best way to avoid it is to have a peripheral that advertises itself often enough.

Thanks for the suggestion, but event apply 1 second delay after scanning and before connecting doesn't help. still running to the same problem.

kvmw avatar May 22 '20 21:05 kvmw

Unfortunately I do not know how could I help you further with this issue. This looks like an Android OS BLE stack problem and I do not know any further mitigations. If you would find any — feel free to add them here and I could think on how to incorporate them into the library.

dariuszseweryn avatar May 25 '20 10:05 dariuszseweryn

It is an Android OS problem - I hit it a lot in the past and the only way to mitigate it is to force Android to clean up as much as you can. My app deals with 500+ sequential bluetooth LE devices and the only reliable way to continue through that many is to turn the bluetooth adapter off, wait until you get the notification that the adapter switched off, then turn it back on again. Obviously you need permission to do this and doing so without informing the user is considered poor behaviour. However, in my application, asking them whether they want to turn it on and off all the time is more obnoxious than just doing it in the background. This is for an industrial application so I don't have to worry too much about re-pairing with headphones or whatever since my app is generally the only thing being run. It does, however, work.

If there was a neat way to automatically cycle the bluetooth adapter on and off using this library that would be super handy, but doing so is only really 10 lines of code. It's more something to be aware of on older versions of android.

In my experience, there is a limited number of connections the bluetooth subsystem will keep track of and that number is 8. Once you have 8 (clientIf 0-7) non-responsive devices in your list you are tanked and you get the 133 error and a lot more besides. A smarter programmer than me might keep track of the errors / timeouts / nonresponding devices and clean up only when necessary but I find it much safer in my application to simply turn the adapter on and off every 5 device interactions regardless of whether they were successful or not. Best day we had was talking to 1000+ devices in a row on the older version of Android without a reboot.

I just replaced my hideously complicated bluetooth code (caching, queueing, threading oh my!) with RxAndroidBle and I am very happy with it - so simple now I was finally able to put in a proper check for read/write characteristic time outs. Getting the adapter to cycle on a set schedule would be the cherry on top but it's easy enough to write that code outside of the library and embed it in your Activity instead (which is probably where that should be)

SmartShepherdUser avatar Jun 07 '20 04:06 SmartShepherdUser

@SmartShepherdUser yeah, you are right. it's seems android issue. The workaround you are using (switching on/off the bluetooth) doesn't work in all devices (for example in most samsung devices). in those devices you have to clear the bluetooth cache/data or reset the devices.

kvmw avatar Jun 08 '20 07:06 kvmw

There was an issue with how many Bluetooth Devices cache entries an Android device may handle. This could explain some of the issues @SmartShepherdUser has faced.

dariuszseweryn avatar Jun 08 '20 09:06 dariuszseweryn

Probably depends a lot on the device involved, the technique works on an ALPS based system I use for a specific industrial purpose. It doesn't cause any harm on a Samsung Galaxy S8 but it may not do any good. The only advice I have is that if you let errors accumulate, the older Android versions never properly release the connections and that is what leads you to the death of the whole bluetooth stack. How that is dealt with under the hood in the drivers is probably specific to the device. Regardless, perhaps the feature should be error accumulation warnings so that the application knows it's hitting the danger zone on earlier versions of Android. It would be one thing that would reduce a lot of application logic in the same way the Rx stuff removed all my queues/caches etc. without proscribing the behaviour so you could handle the error at a higher level (i.e. make the error accumulation a warning, by perhaps looking at the clientIf number coming back in the devices, and when the clientIf number reaches a certain threshold on certain versions of Android, throw a warning)

SmartShepherdUser avatar Jun 08 '20 09:06 SmartShepherdUser

There was an issue with how many Bluetooth Devices cache entries an Android device may handle. This could explain some of the issues @SmartShepherdUser has faced.

No this I think is related to the issue but isn't the underlying issue (max 8 active BLE connections per application). The caching thing looks super bad but I never hit that. I had 1000 BLE devices turned on in a room last year and everything Bluetooth stopped completely, including headphones etc. I don't recommend that ha ha!

SmartShepherdUser avatar Jun 08 '20 09:06 SmartShepherdUser

From what I know it is currently max 13 connections per system (since 9.0 prior to it it was 7). Source. There is an additional restriction in maximum BluetoothGatt objects per system as well, but I AFAIR it is much higher than that.

dariuszseweryn avatar Jun 08 '20 10:06 dariuszseweryn

So to get back to the point, would it be possible to produce a warning before those version specific thresholds were reached? I can help testing and perhaps have some code to contribute. Also need to reiterate this is not an rxandroidble issue but could make it a killer feature.

SmartShepherdUser avatar Jun 08 '20 10:06 SmartShepherdUser

Though it seems to be quite away from the original topic — that is some interesting idea. I will extract it as a separate issue. I have quite a lot on my stack now and definitely not enough time unfortunately.

dariuszseweryn avatar Jun 08 '20 19:06 dariuszseweryn