bleak icon indicating copy to clipboard operation
bleak copied to clipboard

Linux, client.connect() connects then disconnects, using bluetoothctl directly works

Open librick opened this issue 2 years ago • 13 comments

  • bleak version: 0.20.2
  • Python version: 3.11.4
  • Operating System: Linux debian 6.3.0-1-amd64
  • BlueZ version (bluetoothctl -v) in case of Linux: 5.66

Description

Calling client.connect() given a BleakClient instance shows that the device as connected in BlueZ, but then it immediately disconnects again and Bleak throws a TimeoutError. I can connect (and stay connected) just fine using bluetoothctl.

Manually connecting with bluetoothctl works as expected. I can connect with connect <my-mac-address>. Running devices Connected several seconds later shows the connected device. The device does not disconnect and all is well. I see these two lines in the bluetoothctl output:

[CHG] Device 00:14:41:33:28:8D Connected: yes
[CHG] Device 00:14:41:33:28:8D ServicesResolved: yes

Connecting with Bleak does not work. First, inside bluetoothctl, I make sure the device is disconnected: disconnect 00:14:41:33:28:8D (it disconnects as expected). Then I run my Python code. Watching the logs inside bluetoothctl, I see the following four lines;

[CHG] Device 00:14:41:33:28:8D Connected: yes
[CHG] Device 00:14:41:33:28:8D ServicesResolved: yes
[CHG] Device 00:14:41:33:28:8D ServicesResolved: no
[CHG] Device 00:14:41:33:28:8D Connected: no

And the Python code crashes with a TimeoutError. It's possible I'm just missing something obvious with asyncio, but it seems strange to me that the device would connect and then immediately disconnect, and only when using Bleak. I've included my full code and exception logs below.

What I Did

My python code:

import asyncio
from bleak import BleakScanner, BleakClient

DEVICE_NAME = "Razer Stereo"

# I wrapped BleakScanner because I'm only interested
# in finding a specific device (based on local_name for now);
# once a matching device is discovered, _init() resolves
class BleScanner:
    device = None
    async def _init(self):
        self.stop_event = asyncio.Event()
        cb = lambda x, y: self._callback(x, y) 
        async with BleakScanner(detection_callback=cb) as scanner:
            await self.stop_event.wait()

    def _callback(self, device, advertising_data):
        if advertising_data.local_name != None:
            print("[{}] discovered device, local_name: {}".format(device.address, advertising_data.local_name))
        if advertising_data.local_name == DEVICE_NAME:
            self.device = device
            self.stop_event.set()

# the create_ble_scanner() function is a factory
# that creates a BleakScanner. It only resolves
# once the scanner has found the device I want
async def create_ble_scanner():
    sc = BleScanner()
    await sc._init()
    return sc

async def main():
    disconnect_event = asyncio.Event()
    # this implicitly waits for a device to be found before resolving
    scanner = await create_ble_scanner()
    # the create_ble_scanner() coroutine resolved,
    # the device we want is defined under scanner.device
    print(scanner.device)

    print("attempting to connect to device")
    client = BleakClient(scanner.device)
    # THIS IS WHAT FAILS (eventually I get a TimeoutError).
    # When bluetoothctl is open in a separate terminal, it shows the following:
    # > [CHG] Device 00:14:41:33:28:8D Connected: yes
    # > [CHG] Device 00:14:41:33:28:8D ServicesResolved: yes
    # > [CHG] Device 00:14:41:33:28:8D ServicesResolved: no
    # > [CHG] Device 00:14:41:33:28:8D Connected: no
    # the client disconnects almost immediately after it connects
    await client.connect()
    # This print statement is never hit
    print("connected, now sleeping")
    await asyncio.sleep(30)

asyncio.run(main())

Output:

[00:14:41:33:28:8D] discovered device, local_name: Razer Stereo
00:14:41:33:28:8D: Razer Stereo
attempting to connect to device
Traceback (most recent call last):
  File "/home/user/projects/py-kitty/venv/lib/python3.11/site-packages/bleak/backends/bluezdbus/client.py", line 204, in connect
    reply = await self._bus.call(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/projects/py-kitty/venv/lib/python3.11/site-packages/dbus_fast/aio/message_bus.py", line 371, in call
    await future
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/projects/py-kitty/main.py", line 54, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/user/projects/py-kitty/main.py", line 49, in main
    await client.connect()
  File "/home/user/projects/py-kitty/venv/lib/python3.11/site-packages/bleak/__init__.py", line 531, in connect
    return await self._backend.connect(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/projects/py-kitty/venv/lib/python3.11/site-packages/bleak/backends/bluezdbus/client.py", line 141, in connect
    async with async_timeout(timeout):
  File "/usr/lib/python3.11/asyncio/timeouts.py", line 111, in __aexit__
    raise TimeoutError from exc_val
TimeoutError

Logs

I ran it again with logging enabled, using BLEAK_LOGGING=1 python3 main.py. Here's the most relevant section (just before I attempt to connect). I've omitted the TimeoutError output for brevity.

…
00:14:41:33:28:8D: Razer Stereo
attempting to connect to device
2023-07-14 13:45:28,232 bleak.backends.bluezdbus.client MainThread DEBUG: Connecting to device @ 00:14:41:33:28:8D
DEBUG:bleak.backends.bluezdbus.client:Connecting to device @ 00:14:41:33:28:8D
2023-07-14 13:45:28,236 bleak.backends.bluezdbus.client MainThread DEBUG: Connecting to BlueZ path /org/bluez/hci0/dev_00_14_41_33_28_8D
DEBUG:bleak.backends.bluezdbus.client:Connecting to BlueZ path /org/bluez/hci0/dev_00_14_41_33_28_8D
2023-07-14 13:45:33,640 bleak.backends.bluezdbus.manager MainThread DEBUG: received D-Bus signal: org.freedesktop.DBus.Properties.PropertiesChanged (/org/bluez/hci0/dev_00_14_41_33_28_8D): ['org.bluez.Device1', {'Connected': <dbus_fast.signature.Variant ('b', True)>}, []]
DEBUG:bleak.backends.bluezdbus.manager:received D-Bus signal: org.freedesktop.DBus.Properties.PropertiesChanged (/org/bluez/hci0/dev_00_14_41_33_28_8D): ['org.bluez.Device1', {'Connected': <dbus_fast.signature.Variant ('b', True)>}, []]
2023-07-14 13:45:34,383 bleak.backends.bluezdbus.manager MainThread DEBUG: received D-Bus signal: org.freedesktop.DBus.Properties.PropertiesChanged (/org/bluez/hci0/dev_00_14_41_33_28_8D): ['org.bluez.Device1', {'ServicesResolved': <dbus_fast.signature.Variant ('b', True)>}, []]
DEBUG:bleak.backends.bluezdbus.manager:received D-Bus signal: org.freedesktop.DBus.Properties.PropertiesChanged (/org/bluez/hci0/dev_00_14_41_33_28_8D): ['org.bluez.Device1', {'ServicesResolved': <dbus_fast.signature.Variant ('b', True)>}, []]
2023-07-14 13:45:40,393 bleak.backends.bluezdbus.manager MainThread DEBUG: received D-Bus signal: org.freedesktop.DBus.Properties.PropertiesChanged (/org/bluez/hci0/dev_00_14_41_33_28_8D): ['org.bluez.Device1', {'ServicesResolved': <dbus_fast.signature.Variant ('b', False)>}, []]
DEBUG:bleak.backends.bluezdbus.manager:received D-Bus signal: org.freedesktop.DBus.Properties.PropertiesChanged (/org/bluez/hci0/dev_00_14_41_33_28_8D): ['org.bluez.Device1', {'ServicesResolved': <dbus_fast.signature.Variant ('b', False)>}, []]
2023-07-14 13:45:40,393 bleak.backends.bluezdbus.client MainThread DEBUG: _cleanup_all(/org/bluez/hci0/dev_00_14_41_33_28_8D)
DEBUG:bleak.backends.bluezdbus.client:_cleanup_all(/org/bluez/hci0/dev_00_14_41_33_28_8D)
Traceback (most recent call last):
…

librick avatar Jul 14 '23 17:07 librick

Usually when BlueZ disconnects unexpectedly, it is because the device does not follow the Bluetooth spec and does something unexpected. You would need to log Bluetooth packets to see if this is the case.

dlech avatar Jul 14 '23 18:07 dlech

scanner = await create_ble_scanner()

FYI, this can be replaced with scanner = await BleakScanner.find_device_by_name(DEVICE_NAME) instead of subclassing.

dlech avatar Jul 14 '23 18:07 dlech

Thanks for the reply and work on the project :) It's entirely possible that the device is outside the spec; it's a proprietary Bluetooth headset. But it's still strange to me that connecting using bluetoothctl works when used directly.

Looking at the debug output, the only thing that stands out are the two PropertiesChanged lines where ServicesResolved is initially broadcast as True, and is then broadcast as False. Is it possible that that second ServicesResolved signal is somehow causing the connection to timeout? I'll try and get a Wireshark log.

As for subclassing, thanks for the tip. Eventually I plan on implementing more aggressive device/service filtering, thus the non-conventional subclassing code.

librick avatar Jul 14 '23 18:07 librick

If the device is dual-mode, bluetoothctl could be connecting with Bluetooth classic instead of BLE. Again, logging Bluetooth packets is the best way to see the difference between the two connection methods.

Is it possible that that second ServicesResolved signal is somehow causing the connection to timeout?

ServicesResolved going to false is most likely because the device has already disconnected rather than something that triggers it.

Eventually I plan on implementing more aggressive device/service filtering

There is also BleakScanner.find_device_by_filter() if you need custom logic.

dlech avatar Jul 14 '23 19:07 dlech

2023-07-14-razer-bleak-capture.pcapng.gz Here's a Wireshark capture. I started capturing just before running the Python script, and stopped just after the Python script crashed.

Packets no. 7-10 can likely be ignored, they contain advertising data for other nearby devices. Packets 11 and 12 contain advertising data for the device I'm interested in (00:14:41:33:28:8D). The device connects, and shortly after, the host sends a disconnect command.

As for BleakScanner.find_device_by_filter(), I may end up using it. For now I wanted to use the async with syntax. I'll probably end up using the same syntax with BleakClient (rather than using await client.connect()) once I figure out the connection issue.

librick avatar Jul 14 '23 20:07 librick

I don't see anything that jumps out at me. Since the D-bus method call is timing out, you could try making the timeout longer. It could be waiting to pair or something like that. Beyond that, you would need to debug BlueZ to see what it is waiting on.

dlech avatar Jul 14 '23 22:07 dlech

I found a discrepancy between Bleak and bluetoothctl that explains this. I've explained the underlying issue and proposed a change below. (Thanks again for your time, I spent a good chunk of Saturday hacking on this, it's really appreciated)

For my device specifically, I've observed the following non-standard behavior: bluetoothctl sends a dbus message of type method_call to bluetoothd with member "Connect". bluetoothd never sends a method_return message back.

I don't know why. I would expect bluetoothd to send a method_return message back but it doesn't. I assume my device just has a buggy implementation of the BLE spec. Even though the dbus spec says that every method_call message that doesn't have the NO_REPLY_EXPECTED flag set is expected to be followed by a corresponding method_return message, that isn't the case for my device.

bluetoothctl doesn't care. bluetoothctl sends the method_call message with member "Connect". bluetoothd then sends a PropertiesChanged signal with "Connected=True" on the relevant path (/org/bluez/hci0/dev_00_14_41_33_28_8D), and that's enough for bluetoothctl to consider the device connected. The method_return is never received, but it doesn't matter.

Bleak does care. Calling bleak/backends/bluezdbus/client.py async def connect(...) eventually fails with that TimeoutError because client.py is calling into dbus-fast and the dbus-fast/aio/message_bus.py async def call(...) method awaits a future that never resolves. It never resolves because Bleak tells dbus-fast to send the "Connect" method-call and dbus-fast waits for a reply before resolving. That reply never arrives, even though my device enters into a connected state (i.e., even though bluetoothd sends a PropertiesChanged signal with "Connected=True").

This also explains why increasing the timeout doesn't fix this. If I use await client.connect(timeout=600), the device connects (i.e., BlueZ reports the device as connected, and GNOME, also listening to dbus, shows me a connection notification), but then dbus-fast hangs while waiting for a method_return, and ~590 seconds later I get the TimeoutError and the connection is torn down (disconnected). dbus-fast is working as expected, Bleak told it to wait for a reply that never comes.

Proposed Change

I propose that Bleak's connect() method be modified to allow users to specify the NO_REPLY_EXPECTED flag. If the user specifies the NO_REPLY_EXPECTED flag and a PropertiesChanged signal is received for the device with "Connected=True", the awaited connect() coroutine should resolve and Bleak should consider the device as connected (because bluetoothd considers it connected) even if a method_return for the "Connection" method_call is never received.

Other Thoughts

I'm not sure which implementation (bluetoothctl or Bleak) should be considered "correct" in handling this edge case.

Although it works for me, I don't like bluetoothctl's existing approach. Even in bluetoothctl, I think bluetoothctl should fail the connection if it doesn't receive the expected method_return. Because NO_REPLY_EXPECTED is not set and because there's no reply, I think that bluetoothctl is operating outside of the spec. That's why my proposal for Bleak is that the user explicitly opts in to ignoring the reply by setting the NO_REPLY_EXPECTED flag as an argument to connect().

Use Case

I've been hacking on this because I'm trying to add support for my headphones to OpenRazer. Many yet-to-be-supported devices for that project use Bluetooth and probably have similar (non-standard) behavior when establishing connections. If I'm able to reliably use Bleak to establish connections with my device, I can work on an integrating Bleak into the OpenRazer ecosystem. In other words, this issue isn't just pedantic/academic :)

Thanks again for any help!

librick avatar Jul 16 '23 00:07 librick

Also, for anyone debugging connection issues in the future, I used sudo busctl monitor org.bluez to view all BlueZ-related dbus traffic. Comparing the output of that command while running my script vs while running bluetoothctl was enlightening

librick avatar Jul 16 '23 00:07 librick

I propose that Bleak's connect() method be modified to allow users to specify the NO_REPLY_EXPECTED flag.

I consider this a BlueZ bug for having a code path that never sends the return for the "Connect" D-Bus method. I think we should fully understand what is going on in bluetoothd before considering making a hack in Bleak to work around it.

dlech avatar Jul 17 '23 14:07 dlech

That's reasonable. I'll look into what's going on within bluetoothd and follow up.

librick avatar Jul 17 '23 14:07 librick

I looked at bleak/backends/bluezdbus/client.py and I believe the connection signal is expected in the code lines 204-244. I am not sure though.

Maybe if temporarily commenting it out and seeing if connections to your head phones work, could further support the NO REPLY FLAG. I will try for my device which has similar connection issues (Samsung gearVRC).

I see that the code section also includes code to handle if a device is already connected, which does not work with my device on Raspian and I call bluetoothctl before I start scanning with bleak for my device as work around (I need to have my device disconnected before I can connect with bleak):

output = subprocess.check_output(["bluetoothctl", "info", address], timeout=5, text=True)
lines  = output.strip().split("\n")
for line in lines:
    if "Connected: yes" in line:
        return True
   elif "Connected: no" in line:
        return False
   return False

and

output = subprocess.check_output(["bluetoothctl", "disconnect", address], timeout=5, text=True)

to disconnect it. Unfortunately Windows does not have bluetoothctl command.

uutzinger avatar Jul 21 '23 20:07 uutzinger

I changed line 214 to if reply is not None: from assert reply is not None and indented the code until line 245. I need some further testing to make sure it solves my issue. This might also help your issue.

uutzinger avatar Jul 21 '23 20:07 uutzinger

This change had the following positive effect: I no longer have the disconnection call back triggered before client connection is completed.

Before this change the client sometimes reported multiple disconnections before my system was up and running. Now it disconnects when the device goes into sleep mode, calls the disconnection call back and reconnects as soon as I activate the device again.

I would like to support the request for the NO REPLY EXPECTED option or to simply not assert that the reply is not None but to execute current code if there was a reply and to continue as connected otherwise.

uutzinger avatar Jul 21 '23 20:07 uutzinger