Deprecation of `get_services` without replacement?
When using await client.get_services to access the services, we get a warning that this method were deprecated, and one should use the property client.services instead. However, these two are not equivalent; the former performs service discovery if needed, whereas the latter requires service discovery to be complete at that point. Furthermore, it appears that there is no other API that explicitly allows us to await completion of service discovery. Thus, when facing with errors of undiscovered services, we have to resort to an ugly polling loop to repeatedly try to access the services property.
get_services() is called during connect(). So unless services are changing after connect() (as in https://github.com/hbldh/bleak/issues/1639), you should never have to call get_services().
Can you share some code and some logs to help us better understand your use case?
we have to resort to an ugly polling loop to repeatedly try to access the services property
In particular, I don't see how this could work at all without calling get_services(). There are no background tasks that would be updating the services property.
Hm. I'm not using connect() explicitly - I use the async context manager form of BleakClient. Though I presume it's __aenter__ effectively is doing connect()? I did definitely get the BleakError("Service Discovery has not been performed yet") from bleak/__init__.py:697 strictly from within that context.
The device in question (our own) does indeed change services, though not while maintaining a connection. The device may in some cases however fail to properly close the connection before it reboots with a different set of services. Any held connection would then eventually time out the supervision interval, at which point a new connection gets established.
*In fact, we do cache active connections behind an abstraction, because different parts of our software access different services. Structured concurrency ensures that no "requester" ever gets to access .services before the BleakClient context manager is fully entered. It may however be that the cache hasn't noticed that the connection was lost, and the request for the "new" service (after the reboot) will be served by the still-open old BleakClient. But again, thanks to strict adherence to structured concurrency, the BleakClient.__aexit__ wouldn't have started executing at this point. So, unless some unstructured asyncio-style callback inside Bleak removes that services attribute while the connection is going down, before the context is exited, I should never be able to see an empty services inside the BleakClient context, no?
I have observed cases of that exception both on macOS and on linux (RPi). On macOS, I did also observe issues where the OS would cache some metadata about the device (certainly the "Local Name") and incorrectly report the old value after the device reboots. On linux, I have relatively little experience.
Though I presume it's
__aenter__effectively is doingconnect()?
yes
I did definitely get the
BleakError("Service Discovery has not been performed yet")frombleak/__init__.py:697strictly from within that context.
Which version of Bleak? Which OS? What does your code look like?
The device in question (our own) does indeed change services, though not while maintaining a connection
What mechanism does it use notify the central? A services changed notification or the GATT database attribute? (seems like maybe there is a 3rd option I am forgetting)
Not sure if this would work for you, but I've made a 3rd-party firmware for some devices where the vendor didn't implement either of these options, so to be able to switch back and forth between firmwares and get all OSes to re-enumerate services, we just assign a new random Bluetooth address to the device. This way, to the central, it appears as a new device and we don't have to deal with any issues of the OS caching the old services.
The device may in some cases however fail to properly close the connection before it reboots with a different set of services
Not sure what you mean. The only "proper" closing of a connection is exiting the context manager in Python. It doesn't matter if the peripheral device is still actually connected or not.
*In fact, we do cache active connections behind an abstraction, because different parts of our software access different services.
Can you share the code that does this? I'm not really sure what you mean or why you would need to do this. Is it for performance?
So, unless some unstructured asyncio-style callback inside Bleak removes that
servicesattribute while the connection is going down, before the context is exited, I should never be able to see an emptyservicesinside theBleakClientcontext, no?
Indeed, this is possible, at least for some backends. It seems to be inconsistent how this got implemented.
I've been thinking a lot lately about how we can make Bleak work better with structured concurrency, so glad to have your feedback here.
I have observed cases of that exception both on macOS and on linux (RPi)
Yup, these are the two backends that set services = None when the peripheral disconnects rather than in the normal disconnect callback.
On macOS, I did also observe issues where the OS would cache some metadata about the device (certainly the "Local Name") and incorrectly report the old value after the device reboots.
Would be interesting to see some Bluetooth packet capture logs to see what could be causing this. I.e. maybe we stop scanning before receiving a SCAN_RSP that contains the new name.
... some callback inside Bleak removes that services attribute while the connection is going down...
Indeed, this is possible, at least for some backends.
Ok, so probably my connection-sharing code may race to provide a connection to a "requester" while the connection is already going down, triggering the observed exception. As you already explained, there should be no need for a replacement of get_services, and instead we could view this as an (uncritical) bug report that the error message is misleading in that it suggests that the observed error would eventually go away, when in fact, nowadays, the only possible cause for that error is a connection going down.
With respect to the other topics we touched, in order of appearance:
What mechanism does it use notify the central? A services changed notification or the GATT database attribute?
Definitely no change notification; the device is just resetting without notifying the central in any way. Then, it advertises again, and on the next connection, it may present different services. Now, to be honest, I'm not entirely sure if GATT database hash is correctly calculated. I do see a message from my stack passing by which says that the hash calculation is complete, and I kind-of assumed that this hash would have to change when any services change.
Not sure what you mean. The only "proper" closing of a connection is exiting the context manager in Python.
Again, I forgot the details about the BLE connection process, but I thought there is a packet going from either side to the other notifying the intent to close the connection. When our device reboots into a different service, it doesn't give the stack time to send that packet.
Is it for performance?
Indeed. The python software in question is a management tool that should handle 10s of peripherals. At least in some cases, it will just track those devices as they advertise but not connect. Every once in a while it will connect to pull a bit of status, after which it may attempt to connect to a different service (depending on the status). The BlueZ backend doesn't allow to simultaneously scan and to attempt to establish a connection (which is just scanning with automatic scan responses), so there is a lockout mechanism to synchronise that. With too many devices periodically attempting a connection, there is no time left to scan and keep track of the remaining devices, so the goal of that connection sharing is to cut the potential two connection attempts down to one.
I've been thinking a lot lately about how we can make Bleak work better with structured concurrency.
For me, the go-to way for guaranteeing there are no "task leaks", is to write everything in terms of purely structured concurrency primitives, nowadays typically using the anyio API. Then, you can run it on the trio loop, which is unable to do anything but structured concurrency. That said, the above "issue" with the services attribute dying before the context manager is leaving is just mildly surprising; after all, a dying connection will have to throw some exception on almost any operation. It might be a bit less surprising if it only does this on async operations. I do combat that to some degree by wrapping my BleakClient context in a anyio.CancelScope which I cancel as soon as I receive the disconnect callback - perhaps that could be an interesting feature for bleak too.
Other than that, I find that separately registering and unregistering notifications using start_notify and stop_notify feels unstructured; that should be a context manager too.
[cached metadata on macOS]
Would be interesting to see some Bluetooth packet capture logs to see what could be causing this. I.e. maybe we stop scanning before receiving a SCAN_RSP that contains the new name.
I don't think this one is an issue by Bleak; the same thing happens also with other macOS BLE tools.
I'm not entirely sure if GATT database hash is correctly calculated.
Even if it is being calculated, it could be that the actual characteristic isn't included. This is why I keep suggesting to make logs. It will show things like this.
When our device reboots into a different service, it doesn't give the stack time to send that packet.
It would probably work better if it did, but as it is, this isn't really any different than a device going out of range.
so the goal of that connection sharing is to cut the potential two connection attempts down to one.
Not sure I understand why two connections to the same device would be needed.
I do combat that to some degree by wrapping my
BleakClientcontext in aanyio.CancelScopewhich I cancel as soon as I receive the disconnect callback - perhaps that could be an interesting feature forbleaktoo.
Yeah, I was considering doing something like that.
Other than that, I find that separately registering and unregistering notifications using
start_notifyandstop_notifyfeels unstructured; that should be a context manager too.
Unless you need to stop notifications before a disconnect for some reason, you can do without the stop_notify(). And I just tent to use contextlib.AsyncExitStack for things like this rather than making so many context managers.
Going to close this, as I didn't have time to follow up. It may well be that we had a different underlying issue. Anyway, for later reference, here is what we still had done:
I'm not entirely sure if GATT database hash is correctly calculated.
We had eventually been able to verify that the GATT database hash does change.
Not sure I understand why two connections to the same device would be needed.
It isn't. But our code has separate "workers" tasked with different, mostly unrelated parts. One periodically checks the DeviceInformation service to see if the firmware has changed. Another one periodically connects to a custom service, to do other stuff. Because there are many devices, we can't hold the connection. So when either worker intends to their work, they need to be provided a connection. The naive way would be to let each worker just ask Bleak for a new connection object. Instead, we have a separate connection manager that ensures that there is only ever a single connection object per device, and potentially shares it to multiple workers needing access to the device.
Unless you need to stop notifications before a disconnect for some reason, you can do without the
stop_notify()
If there are multiple weakly coupled components that need notifications for a limited duration, then forgetting to do stop_notify will prevent the next start_notify. That in turn would require the application to separately keeping track of the notifications, coupling them more than necessary. In short, not cleaning up after yourself and mutating state with effects beyond its the intended scope is bad design - something that structured concurrency tries to improve upon.
And I just tent to use
contextlib.AsyncExitStackfor things like this rather than making so many context managers.
I do so too. But the point of structured concurrency is to guarantee resource ownership follows the nesting for source code. Separate acquire and release operations require the user to keep track of resource ownership, which they can do e.g. by wrapping it with a context manager, such as AsyncExitStack. But I argue it should be the other way around: APIs should expose a single acquire/release context manager, and if a user really needs to separate the two, they can always pull out __aenter__ and __aexit__ (and they immediately know that they are preparing footguns). Having a notify_context API doesn't preclude you from using AsyncExitStack.enter_async_context to avoid the need for nesting many context managers.
Thanks for the follow up. It is always helpful to see how Bleak gets used in real-world applications.
But I argue it should be the other way around: APIs should expose a single acquire/release context manager, and if a user really needs to separate the two, they can always pull out
__aenter__and__aexit__
I quite agree. Unfortunately, we can't change Bleak to work this way without "breaking the world" as they say. But will certainly consider this for new APIs.