bluer
bluer copied to clipboard
Bluer causes bluetoothd to crash on armv7
On a Raspberry Pi 4 running 32-bit Pi OS, the following program will cause bluetoothd
to crash when the program exits:
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
{
let session = bluer::Session::new().await?;
let adapter = session.default_adapter().await?;
let _scan = adapter.discover_devices().await?;
}
// This delay is just to demonstrate that the crash occurs on program exit, not dropping of bluer objects
// The crash still occurs with no delay
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
Ok(())
}
Running this program causes bluetoothd
to crash with an error similar to the following:
Sep 17 17:54:53 raspberrypi bluetoothd[2535]: realloc(): invalid next size
I believe this is the same issue that was previously reported in #9. However, the bluetoothctl
program does not cause bluetoothd
when performing scans, so there must be something different with bluer's interactions. It is also interesting to note that the crash does not occur until the program exits, even though the bluer objects are already dropped. Also, unlike #9, this crash is occurring on normal program exit, not Ctrl+C.
One thing I note in the DBus logs is that on initialization, a number of DBus matches are added:
method call time=1663451688.404791 sender=:1.206 -> destination=org.freedesktop.DBus serial=2 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
string "type='signal',sender='org.bluez',interface='org.freedesktop.DBus.ObjectManager',member='InterfacesAdded'"
method return time=1663451688.404874 sender=org.freedesktop.DBus -> destination=:1.206 serial=3 reply_serial=2
method call time=1663451688.406506 sender=:1.206 -> destination=org.freedesktop.DBus serial=3 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
string "type='signal',sender='org.bluez',interface='org.freedesktop.DBus.ObjectManager',member='InterfacesRemoved'"
method return time=1663451688.406605 sender=org.freedesktop.DBus -> destination=:1.206 serial=4 reply_serial=3
method call time=1663451688.408146 sender=:1.206 -> destination=org.freedesktop.DBus serial=4 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
string "type='signal',sender='org.bluez',interface='org.freedesktop.DBus.Properties',member='PropertiesChanged'"
method return time=1663451688.408211 sender=org.freedesktop.DBus -> destination=:1.206 serial=5 reply_serial=4
method call time=1663451688.409329 sender=:1.206 -> destination=org.freedesktop.DBus serial=5 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
string "type='method_call'"
method return time=1663451688.409430 sender=org.freedesktop.DBus -> destination=:1.206 serial=6 reply_serial=5
However, when the objects are dropped, only one of those matches is removed:
method call time=1663451688.577775 sender=:1.206 -> destination=org.freedesktop.DBus serial=11 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=RemoveMatch
string "type='signal',sender='org.bluez',interface='org.freedesktop.DBus.ObjectManager',member='InterfacesAdded'"
method return time=1663451688.577808 sender=org.freedesktop.DBus -> destination=:1.206 serial=7 reply_serial=11
I don't know enough about DBus or bluez to know if that could cause the problem or if it's just a red herring, but it's the only unusual thing in the DBus logs that jumped out to me.
Versions:
- bluer: 0.15.0
- bluez: 5.55-3.1+rpt1
- bluez-firmware: 1.2-4+rpt8
A couple of additional notes.
- The crash still occurs with bluez 5.64.
- Commenting out the call to "SetDiscoveryFilter" in src/adapter.rs prevents the crash.
Please open a bug with BlueZ, i.e. bluetoothd, since it should not crash under any circumstances.
I believe the upstream issue is https://github.com/bluez/bluez/issues/196. I agree that the root problem should be fixed there, but there doesn't seem to have been much movement on that issue since it was opened a year ago. And other clients seem to work fine on armv7, so I thought it might be worth investigating what it is about bluer's messages that cause the daemon to crash. If not, it might be worth documenting that armv7 targets are not supported by bluer until the upstream issue is resolved.
I would suggest running bluetoothd under Memcheck and try to track down the bug that way. It should be easy to find that way. Working around the symptoms is usually not a good idea, especially in open source projects when the underlying cause can be fixed by submitting a patch.
It is crashing inside libdbus though and I don't see how we could avoid that since it appears to be something internal when dealing with a DBusMessage.
Might this be a similar type mismatch to #49 ?
I can confirm that this is still an issue.
After some digging, it seems like blueR does not exit cleanly in any circumstances. Consider this example program:
fn main() {
simple_logger::SimpleLogger::new()
.with_level(log::LevelFilter::Trace)
.init()
.unwrap();
let rt = tokio::runtime::Builder::new_current_thread()
.enable_time()
.enable_io()
.build()
.expect("failed to build tokio runtime");
log::info!("Starting async tokio task...");
let _ = rt.block_on(async {
let session = bluer::Session::new().await?;
let adapter = session.default_adapter().await?;
adapter.set_powered(true).await?;
log::info!("Starting discovery");
let event_stream = adapter.discover_devices().await?;
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
log::info!("Stopping discovery");
bluer::Result::Ok(())
});
//explicit drop and flush to make sure messages are not stuck somewhere
drop(rt);
log::logger().flush();
std::io::stdout().flush().unwrap();
std::io::stderr().flush().unwrap();
}
For me, this procudes the following log output:
2023-08-23T10:37:28.942Z INFO [us_service] Starting async tokio task... 2023-08-23T10:37:28.943Z TRACE [bluer::session] Connected to D-Bus with unique name :1.28 2023-08-23T10:37:28.944Z TRACE [bluer::session] Starting event loop for :1.28 2023-08-23T10:37:28.945Z TRACE [bluer::adapter] /org/bluez/hci0: org.bluez.Adapter1.Powered := true 2023-08-23T10:37:28.945Z INFO [us_service] Starting discovery 2023-08-23T10:37:28.945Z TRACE [bluer::session] Starting new single session for /org/bluez/hci0 2023-08-23T10:37:28.945Z TRACE [bluer::adapter] /org/bluez/hci0: org.bluez.Adapter1.SetDiscoveryFilter ({"UUIDs": Variant([]), "DuplicateData": Variant(true), "Transport": Variant("auto"), "Discoverable": Variant(false)},) 2023-08-23T10:37:28.946Z TRACE [bluer::adapter] /org/bluez/hci0: org.bluez.Adapter1.SetDiscoveryFilter (...) -> Ok(()) 2023-08-23T10:37:28.946Z TRACE [bluer::adapter] /org/bluez/hci0: org.bluez.Adapter1.StartDiscovery () 2023-08-23T10:37:28.954Z TRACE [bluer::adapter] /org/bluez/hci0: org.bluez.Adapter1.StartDiscovery (...) -> Ok(()) 2023-08-23T10:37:28.954Z TRACE [bluer::session] Adding event subscription for /org/bluez/hci0 with child_objects=true 2023-08-23T10:37:28.988Z TRACE [bluer::session] Event: ObjectAdded { object: Path("/org/bluez/hci0/dev_A8_51_AB_12_52_9C\0"), interfaces: {"org.freedesktop.DBus.Properties", "org.freedesktop.DBus.Introspectable", "org.bluez.Device1"} } 2023-08-23T10:37:29.070Z TRACE [bluer::session] Event: ObjectAdded { object: Path("/org/bluez/hci0/dev_B8_BC_5B_71_CA_07\0"), interfaces: {"org.bluez.Device1", "org.freedesktop.DBus.Introspectable", "org.freedesktop.DBus.Properties"} } 2023-08-23T10:37:29.747Z TRACE [bluer::session] Event: ObjectRemoved { object: Path("/org/bluez/hci0/dev_38_18_4C_BB_F1_E0\0"), interfaces: {"org.freedesktop.DBus.Introspectable", "org.bluez.Device1", "org.freedesktop.DBus.Properties"} } 2023-08-23T10:37:29.747Z TRACE [bluer::session] Event: ObjectAdded { object: Path("/org/bluez/hci0/dev_54_15_89_48_F1_45\0"), interfaces: {"org.freedesktop.DBus.Introspectable", "org.bluez.Device1", "org.freedesktop.DBus.Properties"} } 2023-08-23T10:37:29.956Z INFO [us_service] Stopping discovery
When setting the log level to trace, one would expect at least a "StopDiscovery" message on the dbus, which doesnt happen. This seems to be a new issue because earlier posts of @surban showed log output including cleanup messages.
I tried to dig even deeper into it, and it seems like the whole cleanup routine of "SessionInner" is never executed. From my understanding, at least this code should be executed on program exit:
tokio::spawn(async move {
let _ = term_rx.await;
log::trace!("Terminating single session for {}", &path);
stop_fn.await;
let _ = termed_tx.send(());
log::trace!("Terminated single session for {}", &path);
});
However, the drop trait of "SessionInner" only calls ".abort" on the tokio Task.
I also noticed that by not assigning the event stream, the cleanup DOES happen. Most likely because in this case the compiler removes the "start_discovery" call alltogether. Maybe this calls causes some circular referencing issue, or the event stream uses a reference for too long. I also tried dropping the event stream manually via "drop" before destroying the adapter/session, without effect.
This seems like quite an important issue. It could also explain other posts, e.g. the "busy" issue in BlueZ after termination.
Edit: When dropping the "event_stream" and waiting for some time (e.g. 1 second) before leaving the scope, the cleanup does happen. Seems like it needs some time to send those messages. Seems weird. What would be the proper way of handling the wait time? Hardcoding a certain time period is not a good solution.
However, even with successfull clean up, the crash on armv7 still happens. This is strange, because the bluetoothctl tool works without issues. For me it still seems like maybe something isnt cleaned up properly by blueR.
On my Raspi, bluetoothd started crashing in the last few days, probably some os package updates introduced that, don't know. After updating all dependencies, the issue was still there. I've now added a DiscoveryFilter
which keeps bluetoothd happy so far.
let filter = DiscoveryFilter {
transport: DiscoveryTransport::Le,
..Default::default()
};
adapter.set_discovery_filter(filter).await?;
https://github.com/fujexo/rs-terrarium/commit/f00721cf4b33dc28ef8b9e1f1fac051319668647#diff-ce3ebd3348cca1aec72ae2cb137e89396eb56453317c7c8df472f9bec70f7d3bR67-R72
I can confirm that this is still an issue.
After some digging, it seems like blueR does not exit cleanly in any circumstances. Consider this example program:
fn main() { simple_logger::SimpleLogger::new() .with_level(log::LevelFilter::Trace) .init() .unwrap(); let rt = tokio::runtime::Builder::new_current_thread() .enable_time() .enable_io() .build() .expect("failed to build tokio runtime"); log::info!("Starting async tokio task..."); let _ = rt.block_on(async { let session = bluer::Session::new().await?; let adapter = session.default_adapter().await?; adapter.set_powered(true).await?; log::info!("Starting discovery"); let event_stream = adapter.discover_devices().await?; tokio::time::sleep(tokio::time::Duration::from_secs(1)).await; log::info!("Stopping discovery"); bluer::Result::Ok(()) }); //explicit drop and flush to make sure messages are not stuck somewhere drop(rt); log::logger().flush(); std::io::stdout().flush().unwrap(); std::io::stderr().flush().unwrap(); }
From the Tokio docs:
Shutting down the runtime is done by dropping the value, or calling shutdown_background or shutdown_timeout.
Tasks spawned through Runtime::spawn keep running until they yield. Then they are dropped. They are not guaranteed to run to completion, but might do so if they do not yield until completion.
Dropping the Tokio runtime does not guarantee that all background tasks are run to completion, thus StopDiscovery
is never called. Until Rust gets async drop there is no proper way to fix this. A possible workaround would be to add an explicit stop()
method on the discovery event stream.
Nevertheless, bluetoothd
as a system service should not crash, even if an application exits with a discovery session in progress. Imagine what would happen if your OS kernel crashes, just because a program exited without closing a file descriptor or TCP socket. Thus the bug lies within bluetoothd
or libdbus
and should be fixed there.
Hey everyone and @surban ,
I have an update on the issue. I spend some days investigating further and found the cause of the problem. It is indeed an error in the official blueZ implementation.
In blueZ, the scan filter gets parsed when receiving the dbus message. During deserialization, blueZ has to deserialize two booleans (the flags for discoverable and duplicate data in the scan filter). The error here was that dbus always requires booleans to be 4 bytes, but blueZ uses stdbool which on many platforms only has one byte. This leads to a heap corruption of the blueZ service, causing a random crash shortly later.
I have submitted a patch for blueZ here: https://patchwork.kernel.org/project/bluetooth/patch/[email protected]/ With the patch, the issue is fixed for me and i can use blueR with blueZ on ARMv7 just fine.
So why does this only happen on ARM? My best guess here is that on x86 the filter struct has more padding bytes after the two boolean flags (which come last in the C struct), so its actually pure luck that it doesnt crash on x86. The code is still wrong on x86, too.
Here is another interesting find:
The crash does not happen when using bluetoothctl. So why is that? After some more digging it turns out that bluetoothctl behaves different from blueR when it comes to setting the scan filter. It does NOT set the two boolean flags in the scan filter if they are set to false, which is the default. The behavior of bluetoothctl seems to be that, if a flag has its default value, its not send at all. However, blueR sends the two flags with a "false" value, causing the deserialization and heap corruption in blueZ.
@surban Even though this issue is theoretically fixed now, I think it might still be worth investigating a "hack fix" for blueR here. Almost all users of blueZ will experience this crash otherwise and I can not tell when the fix will be live. Maybe it would be a possibility to not send the discoverable and duplicate flags at all when they are set to false (just like bluetoothctl does)? Even though this is not blueRs fault, it would at least improve compatbility with blueZ vastly for allmost all users (think e.g. Raspberry Pi projects). If thats not possible, maybe we can include a big warning for these two flags in regards to ARM / the blueZ verison crahses in the documentation + sources?
Thanks for figuring that out @Mille2525!
Could you try with version 0.16.2-pre1 and check if it fixes the issue with an unpatched bluetoothd?
@surban I can check, but currently I only see changes to the release notes and cargo.toml file in that branch. Or am I mistaken? What changed in the code specifically?
https://github.com/bluez/bluer/commit/51a3c307df1547453c6c9f13d11079c6d2f038b3
Hi @surban ,
thanks for the clarification! How did you determine the default values? I see that you assume true and false as defaults for duplicateData and discoverable, respectively.
However, to me it seems like both are "false" by default, at least if I interpret the source code of bluetoothctl correctly (See: https://github.com/bluez/bluez/blob/8b035b70f379e45a32c94579ec8d00e48070c21e/client/main.c#L1251 )
Also: I just noticed that the bluez heap crash patch was accepted into the master! :) So this issue should hopefully be gone in newer bluez versions.
I have taken the default values from the D-Bus API documentation. However, from the code it seems that you are right and both default to false.
I've changed the default assumption in version 0.17.0-pre1. Please test.
Fixed in version 0.17.0.