QtUsb
QtUsb copied to clipboard
Linux crash: assertion failed in ~QUsbDevice()
Describe the bug
When a QUsbDevice is destroyed, a locked mutex remains behind which causes assertion failure that leads to a fatal crash.
Platform:
- OS: Arch Linux x86_64
- Version 5.12.3-arch1-1, latest QtUsb, libusb 1.0.24-2
To Reproduce
- Compile any example (e.g. SimpleBulkTransfer).
- Run the compiled binary on a Linux OS with pthread-enabled libusb (with an actual USB device attached).
- Observe a fatal crash due to assertion failure in
usbi_mutex_destroy.
Expected behavior The example's binary should terminate without error.
Additional context I believe that this has been happening all along on Linux and any other POSIX platforms. According to this mailing list, version 1.0.24 of libusb (particularly this commit) introduced additional assertions which only revealed this behavior. In the same thread, the author claims to have solved the issue, providing the following explanation:
The issue is in my teardown code, which, by preventing usage of the libusb context during its teardown, is accidentally preventing the event thread from freeing the event lock.
With that in mind, I tried recompiling QtUsb with this line commented out, hoping that this would allow the internal event thread to finish handling all in-flight events before libusb_exit is called. While this modification indeed caused the thread in question to exit on its own (specifically here, it unfortunately had no effect on the bug.
Here is a log of the crash that was taken with QUsb::logDebugAll: crash.log
Thanks for reporting this @petrmanek :) Can you try the fixes in my PR #78?
@fpoussin Thanks for a quick response and a proposed fix. I will try running it later today, and let you know how it goes. 😉
@fpoussin I have news. I just rebased your branch onto my local build and tried running it with my program, and it still seems to crash in the same way.
I am attaching one more log (this time taken with not only debug logs from QtUsb, but also the LIBUSB_DEBUG=4 environment variable). Hope it helps.
One more bit: I re-ran the program in lldb and got a different error. This time libusb is complaining about uncanceled transfers, which may hint at whatever is wrong (perhaps a race?). Attaching another log file.
This one might be the culprit: https://github.com/libusb/libusb/blob/v1.0.24/libusb/io.c#L1232
Can you get a traceback?
@fpoussin Looks plausible (or possibly also the events_lock).
Hold on, I am currently using the libusb my distro came with. I will rebuild it from source using the on-board one and try to confirm.
Interesting. So in hope of symbolifying my (otherwise address-only) backtrace, I have rebuilt QtUsb linking against the on-board libusb and hidapi. This alone was sufficient to get rid of the crash (note that I am still using the patch from your PR).
I am unclear about why there is a difference in behavior. As far as I can tell, the libusb source version QtUsb is using matches that of my distro and I am using the same build configuration for the build.
In case I have missed anything, I am attaching a debug log of the non-crash: noncrash.log
After further testing I conclude that this apparent improvement is still unreliable, unfortunately. Sometimes I get a crash, other times I don't, and the only time it works reasonably well is when I run the program with a debugger attached. This leads me to believe that we're dealing with a textbook race condition.
I've added some debugging output to the branch, however i'm not sure what's causing this...
@fpoussin Thank you for your continued support! I will recompile and try again tomorrow. Hopefully more information will help shed some light on this problem. Also, since I am now this is a non-deterministic issue, I will make it a point to repeat each test from this point on to ensure accurate results.
Apart from logs with LIBUSB_DEBUG=4, are there any other materials I can share to help you put a finger on the culprit?
I got more news. So currently the situation is as follows: If I link QtUsb against the libusb binary shipped with my distro, I always get a crash with the following backtrace:
demo_program: os/threads_posix.h:58: usbi_mutex_destroy: Assertion `pthread_mutex_destroy(mutex) == 0' failed.
Process 33890 stopped
* thread #22, name = 'QThread', stop reason = hit program assert
frame #4: 0x00007fffe484c054 libusb-1.0.so.0`___lldb_unnamed_symbol31$$libusb-1.0.so.0 + 36
libusb-1.0.so.0`___lldb_unnamed_symbol31$$libusb-1.0.so.0:
-> 0x7fffe484c054 <+36>: nopw %cs:(%rax,%rax)
0x7fffe484c05f <+47>: nop
libusb-1.0.so.0`___lldb_unnamed_symbol32$$libusb-1.0.so.0:
0x7fffe484c060 <+0>: subq $0x8, %rsp
0x7fffe484c064 <+4>: leaq 0xafe5(%rip), %rcx
* thread #22, name = 'QThread', stop reason = hit program assert
frame #0: 0x00007ffff5041d22 libc.so.6`raise + 322
frame #1: 0x00007ffff502b862 libc.so.6`abort + 278
frame #2: 0x00007ffff502b747 libc.so.6`__assert_fail_base.cold + 15
frame #3: 0x00007ffff503a616 libc.so.6`__assert_fail + 70
* frame #4: 0x00007fffe484c054 libusb-1.0.so.0`___lldb_unnamed_symbol31$$libusb-1.0.so.0 + 36
frame #5: 0x00007fffe484e42b libusb-1.0.so.0`___lldb_unnamed_symbol48$$libusb-1.0.so.0 + 267
frame #6: 0x00007fffe4849970 libusb-1.0.so.0`libusb_exit + 336
frame #7: 0x00007fffe487bd18 libExtUsb.so`QUsbDevicePrivate::~QUsbDevicePrivate() + 152
frame #8: 0x00007fffe487bf09 libExtUsb.so`QUsbDevicePrivate::~QUsbDevicePrivate() + 9
frame #9: 0x00007fffe487c129 libExtUsb.so`QUsbDevice::~QUsbDevice() + 9
frame #10: 0x00007ffff56f45da libQt6Core.so.6`QObjectPrivate::deleteChildren() + 74
frame #11: 0x00007ffff56f8d94 libQt6Core.so.6`QObject::~QObject() + 1268
frame #12: 0x00007fffe4738661 libdemo.so`Control::~Control(this=0x00007fff8f7fd678) at control.cpp:35:1
[...]
Unfortunately you can see that all the interesting parts are unnamed symbols. :confused:
If I link QtUsb against its on-board libusb, the crash disappears. Even though I tried my best to minimize the differences between QtUsb's version of libusb and the one shipped by my distro, they are not perfectly identical:
- The versions should match, namely both QtUsb and my distro use this commit:
c6a35c56016ea2ab2f19115d2ea1e85e0edae155. However my distro applies (an additional patch)[https://github.com/archlinux/svntogit-packages/blob/packages/libusb/trunk/libusb-1.0.24-descriptor_parsing.patch] to libusb's source. I do not believe this would make any difference as it has to do with descriptor parsing, however if you want I can try applying that locally as well. - The build configuration matches too (both QtUsb's and my distro's libusb should internally rely on
libudev). However, I can never be sure if I included the files correctly, since I was forced to reverse engineer what files should be included from the autotools configuration.
Hope it helps!
One more bit of information. I have been testing the version of QtUsb with on-board libusb extensively, and while there were no crashes due to the aforementioned assertion failure, my demo program deadlocks every now and then. There is definitely some issue in the teardown stage QUsbDevice.
Here is a backtrace I took during one of the deadlocks. It shows that a background thread that uses QtUsb (#22) is blocked indefinitely on QUsbDevice::close. Curiously, libusb spins up an event thread of its own (#21), which appears completely unaffected by the termination. Could that be possibly relevant?
[...]
thread #21, name = 'libusb_event'
frame #0: 0x00007ffff50f8b2f libc.so.6`__poll + 79
frame #1: 0x00007fffe4878187 libExtUsb.so`linux_udev_event_thread_main + 151
frame #2: 0x00007ffff4f28259 libpthread.so.0`start_thread + 233
frame #3: 0x00007ffff51035e3 libc.so.6`__clone + 67
thread #22, name = 'QThread'
frame #0: 0x00007ffff4f348ca libpthread.so.0`__futex_abstimed_wait_common64 + 202
frame #1: 0x00007ffff4f2e270 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 496
frame #2: 0x00007ffff58262fc libQt6Core.so.6`QWaitCondition::wait(QMutex*, QDeadlineTimer) + 284
frame #3: 0x00007ffff581d332 libQt6Core.so.6`QThread::wait(QDeadlineTimer) + 130
frame #4: 0x00007fffe486430f libExtUsb.so`QUsbDevice::close() + 159
frame #5: 0x00007fffe4744717 libdemo.so`Control::close(this=0x00007fff8f7fd678) at control.cpp:149:19
frame #6: 0x00007fffe4764b72 libdemo.so`MyDiscoveryWorker::enumerateDevice(this=0x0000555555958a90, id=0x00007fff88007bc0) at discovery_worker.cpp:57:11
frame #7: 0x00007fffe48a075a libdemo.so`UsbDiscoveryWorker::searchDevices(this=0x0000555555958a90) at discovery_worker.cpp:35:5
frame #8: 0x00007fffe48a00be libdemo.so`UsbDiscoveryWorker::qt_static_metacall(_o=0x0000555555958a90, _c=InvokeMetaMethod, _id=4, _a=0x0000555555d768e8) at moc_discovery_worker.cpp:102:21
[...]
Just wanted to share one more backtrace I just encountered. This time I got a segfault instead of an assertion failure. Perhaps that may help in some way.
[...]
thread #25, name = 'QThread'
frame #0: 0x00007ffff5280b2f libc.so.6`__poll + 79
frame #1: 0x00007fffe48b774c libExtUsb.so`usbi_wait_for_events + 92
frame #2: 0x00007fffe48b59f1 libExtUsb.so`handle_events + 321
frame #3: 0x00007fffe48b69d7 libExtUsb.so`libusb_handle_events_timeout_completed + 503
frame #4: 0x00007fffe48a6f0b libExtUsb.so`QUsbEventsThread::run() + 75
frame #5: 0x00007ffff58616b3 libQt6Core.so.6`___lldb_unnamed_symbol4394$$libQt6Core.so.6 + 355
frame #6: 0x00007ffff5031259 libpthread.so.0`start_thread + 233
frame #7: 0x00007ffff528b5e3 libc.so.6`__clone + 67
* thread #26, name = 'QThread', stop reason = signal SIGSEGV: address access protected (fault address: 0x7fff900008d0)
* frame #0: 0x00007fff900008d0
frame #1: 0x00007fffe48b0b59 libExtUsb.so`log_v + 441
frame #2: 0x00007fffe48b12da libExtUsb.so`usbi_log + 138
frame #3: 0x00007fffe48b773e libExtUsb.so`usbi_wait_for_events + 78
frame #4: 0x00007fffe48b59f1 libExtUsb.so`handle_events + 321
frame #5: 0x00007fffe48b69d7 libExtUsb.so`libusb_handle_events_timeout_completed + 503
frame #6: 0x00007fffe48a6f0b libExtUsb.so`QUsbEventsThread::run() + 75
frame #7: 0x00007ffff58616b3 libQt6Core.so.6`___lldb_unnamed_symbol4394$$libQt6Core.so.6 + 355
frame #8: 0x00007ffff5031259 libpthread.so.0`start_thread + 233
frame #9: 0x00007ffff528b5e3 libc.so.6`__clone + 67
[...]
@fpoussin Have you had any luck reproducing the issue? Can I provide any more information?
No, haven't seen this behavior so far. Did you try with Qt 5.x as well?
@fpoussin I have not. I will try tomorrow and let you know how it goes.
Ever since #92 was merged I have not encountered any symptoms of this bug for over a year. Closing this as resolved. :tada: