QtUsb icon indicating copy to clipboard operation
QtUsb copied to clipboard

Linux crash: assertion failed in ~QUsbDevice()

Open petrmanek opened this issue 4 years ago • 16 comments

Describe the bug When a QUsbDevice is destroyed, a locked mutex remains behind which causes assertion failure that leads to a fatal crash.

Platform:

  • OS: Arch Linux x86_64
  • Version 5.12.3-arch1-1, latest QtUsb, libusb 1.0.24-2

To Reproduce

  1. Compile any example (e.g. SimpleBulkTransfer).
  2. Run the compiled binary on a Linux OS with pthread-enabled libusb (with an actual USB device attached).
  3. Observe a fatal crash due to assertion failure in usbi_mutex_destroy.

Expected behavior The example's binary should terminate without error.

Additional context I believe that this has been happening all along on Linux and any other POSIX platforms. According to this mailing list, version 1.0.24 of libusb (particularly this commit) introduced additional assertions which only revealed this behavior. In the same thread, the author claims to have solved the issue, providing the following explanation:

The issue is in my teardown code, which, by preventing usage of the libusb context during its teardown, is accidentally preventing the event thread from freeing the event lock.

With that in mind, I tried recompiling QtUsb with this line commented out, hoping that this would allow the internal event thread to finish handling all in-flight events before libusb_exit is called. While this modification indeed caused the thread in question to exit on its own (specifically here, it unfortunately had no effect on the bug.

Here is a log of the crash that was taken with QUsb::logDebugAll: crash.log

petrmanek avatar May 23 '21 22:05 petrmanek

Thanks for reporting this @petrmanek :) Can you try the fixes in my PR #78?

fpoussin avatar May 26 '21 21:05 fpoussin

@fpoussin Thanks for a quick response and a proposed fix. I will try running it later today, and let you know how it goes. 😉

petrmanek avatar May 27 '21 10:05 petrmanek

@fpoussin I have news. I just rebased your branch onto my local build and tried running it with my program, and it still seems to crash in the same way.

I am attaching one more log (this time taken with not only debug logs from QtUsb, but also the LIBUSB_DEBUG=4 environment variable). Hope it helps.

petrmanek avatar May 27 '21 10:05 petrmanek

One more bit: I re-ran the program in lldb and got a different error. This time libusb is complaining about uncanceled transfers, which may hint at whatever is wrong (perhaps a race?). Attaching another log file.

petrmanek avatar May 27 '21 11:05 petrmanek

This one might be the culprit: https://github.com/libusb/libusb/blob/v1.0.24/libusb/io.c#L1232

Can you get a traceback?

fpoussin avatar May 27 '21 12:05 fpoussin

@fpoussin Looks plausible (or possibly also the events_lock).

Hold on, I am currently using the libusb my distro came with. I will rebuild it from source using the on-board one and try to confirm.

petrmanek avatar May 27 '21 12:05 petrmanek

Interesting. So in hope of symbolifying my (otherwise address-only) backtrace, I have rebuilt QtUsb linking against the on-board libusb and hidapi. This alone was sufficient to get rid of the crash (note that I am still using the patch from your PR).

I am unclear about why there is a difference in behavior. As far as I can tell, the libusb source version QtUsb is using matches that of my distro and I am using the same build configuration for the build.

In case I have missed anything, I am attaching a debug log of the non-crash: noncrash.log

petrmanek avatar May 27 '21 15:05 petrmanek

After further testing I conclude that this apparent improvement is still unreliable, unfortunately. Sometimes I get a crash, other times I don't, and the only time it works reasonably well is when I run the program with a debugger attached. This leads me to believe that we're dealing with a textbook race condition.

petrmanek avatar May 28 '21 09:05 petrmanek

I've added some debugging output to the branch, however i'm not sure what's causing this...

fpoussin avatar May 30 '21 17:05 fpoussin

@fpoussin Thank you for your continued support! I will recompile and try again tomorrow. Hopefully more information will help shed some light on this problem. Also, since I am now this is a non-deterministic issue, I will make it a point to repeat each test from this point on to ensure accurate results.

Apart from logs with LIBUSB_DEBUG=4, are there any other materials I can share to help you put a finger on the culprit?

petrmanek avatar May 30 '21 21:05 petrmanek

I got more news. So currently the situation is as follows: If I link QtUsb against the libusb binary shipped with my distro, I always get a crash with the following backtrace:

demo_program: os/threads_posix.h:58: usbi_mutex_destroy: Assertion `pthread_mutex_destroy(mutex) == 0' failed.
Process 33890 stopped
* thread #22, name = 'QThread', stop reason = hit program assert
    frame #4: 0x00007fffe484c054 libusb-1.0.so.0`___lldb_unnamed_symbol31$$libusb-1.0.so.0 + 36
libusb-1.0.so.0`___lldb_unnamed_symbol31$$libusb-1.0.so.0:
->  0x7fffe484c054 <+36>: nopw   %cs:(%rax,%rax)
    0x7fffe484c05f <+47>: nop    

libusb-1.0.so.0`___lldb_unnamed_symbol32$$libusb-1.0.so.0:
    0x7fffe484c060 <+0>:  subq   $0x8, %rsp
    0x7fffe484c064 <+4>:  leaq   0xafe5(%rip), %rcx

* thread #22, name = 'QThread', stop reason = hit program assert
    frame #0: 0x00007ffff5041d22 libc.so.6`raise + 322
    frame #1: 0x00007ffff502b862 libc.so.6`abort + 278
    frame #2: 0x00007ffff502b747 libc.so.6`__assert_fail_base.cold + 15
    frame #3: 0x00007ffff503a616 libc.so.6`__assert_fail + 70
  * frame #4: 0x00007fffe484c054 libusb-1.0.so.0`___lldb_unnamed_symbol31$$libusb-1.0.so.0 + 36
    frame #5: 0x00007fffe484e42b libusb-1.0.so.0`___lldb_unnamed_symbol48$$libusb-1.0.so.0 + 267
    frame #6: 0x00007fffe4849970 libusb-1.0.so.0`libusb_exit + 336
    frame #7: 0x00007fffe487bd18 libExtUsb.so`QUsbDevicePrivate::~QUsbDevicePrivate() + 152
    frame #8: 0x00007fffe487bf09 libExtUsb.so`QUsbDevicePrivate::~QUsbDevicePrivate() + 9
    frame #9: 0x00007fffe487c129 libExtUsb.so`QUsbDevice::~QUsbDevice() + 9
    frame #10: 0x00007ffff56f45da libQt6Core.so.6`QObjectPrivate::deleteChildren() + 74
    frame #11: 0x00007ffff56f8d94 libQt6Core.so.6`QObject::~QObject() + 1268
    frame #12: 0x00007fffe4738661 libdemo.so`Control::~Control(this=0x00007fff8f7fd678) at control.cpp:35:1

[...]

Unfortunately you can see that all the interesting parts are unnamed symbols. :confused:

If I link QtUsb against its on-board libusb, the crash disappears. Even though I tried my best to minimize the differences between QtUsb's version of libusb and the one shipped by my distro, they are not perfectly identical:

  1. The versions should match, namely both QtUsb and my distro use this commit: c6a35c56016ea2ab2f19115d2ea1e85e0edae155. However my distro applies (an additional patch)[https://github.com/archlinux/svntogit-packages/blob/packages/libusb/trunk/libusb-1.0.24-descriptor_parsing.patch] to libusb's source. I do not believe this would make any difference as it has to do with descriptor parsing, however if you want I can try applying that locally as well.
  2. The build configuration matches too (both QtUsb's and my distro's libusb should internally rely on libudev). However, I can never be sure if I included the files correctly, since I was forced to reverse engineer what files should be included from the autotools configuration.

Hope it helps!

petrmanek avatar May 31 '21 11:05 petrmanek

One more bit of information. I have been testing the version of QtUsb with on-board libusb extensively, and while there were no crashes due to the aforementioned assertion failure, my demo program deadlocks every now and then. There is definitely some issue in the teardown stage QUsbDevice.

Here is a backtrace I took during one of the deadlocks. It shows that a background thread that uses QtUsb (#22) is blocked indefinitely on QUsbDevice::close. Curiously, libusb spins up an event thread of its own (#21), which appears completely unaffected by the termination. Could that be possibly relevant?

[...]
  thread #21, name = 'libusb_event'
    frame #0: 0x00007ffff50f8b2f libc.so.6`__poll + 79
    frame #1: 0x00007fffe4878187 libExtUsb.so`linux_udev_event_thread_main + 151
    frame #2: 0x00007ffff4f28259 libpthread.so.0`start_thread + 233
    frame #3: 0x00007ffff51035e3 libc.so.6`__clone + 67
  thread #22, name = 'QThread'
    frame #0: 0x00007ffff4f348ca libpthread.so.0`__futex_abstimed_wait_common64 + 202
    frame #1: 0x00007ffff4f2e270 libpthread.so.0`pthread_cond_wait@@GLIBC_2.3.2 + 496
    frame #2: 0x00007ffff58262fc libQt6Core.so.6`QWaitCondition::wait(QMutex*, QDeadlineTimer) + 284
    frame #3: 0x00007ffff581d332 libQt6Core.so.6`QThread::wait(QDeadlineTimer) + 130
    frame #4: 0x00007fffe486430f libExtUsb.so`QUsbDevice::close() + 159
    frame #5: 0x00007fffe4744717 libdemo.so`Control::close(this=0x00007fff8f7fd678) at control.cpp:149:19
    frame #6: 0x00007fffe4764b72 libdemo.so`MyDiscoveryWorker::enumerateDevice(this=0x0000555555958a90, id=0x00007fff88007bc0) at discovery_worker.cpp:57:11
    frame #7: 0x00007fffe48a075a libdemo.so`UsbDiscoveryWorker::searchDevices(this=0x0000555555958a90) at discovery_worker.cpp:35:5
    frame #8: 0x00007fffe48a00be libdemo.so`UsbDiscoveryWorker::qt_static_metacall(_o=0x0000555555958a90, _c=InvokeMetaMethod, _id=4, _a=0x0000555555d768e8) at moc_discovery_worker.cpp:102:21
[...]

petrmanek avatar May 31 '21 12:05 petrmanek

Just wanted to share one more backtrace I just encountered. This time I got a segfault instead of an assertion failure. Perhaps that may help in some way.

[...]
  thread #25, name = 'QThread'
    frame #0: 0x00007ffff5280b2f libc.so.6`__poll + 79
    frame #1: 0x00007fffe48b774c libExtUsb.so`usbi_wait_for_events + 92
    frame #2: 0x00007fffe48b59f1 libExtUsb.so`handle_events + 321
    frame #3: 0x00007fffe48b69d7 libExtUsb.so`libusb_handle_events_timeout_completed + 503
    frame #4: 0x00007fffe48a6f0b libExtUsb.so`QUsbEventsThread::run() + 75
    frame #5: 0x00007ffff58616b3 libQt6Core.so.6`___lldb_unnamed_symbol4394$$libQt6Core.so.6 + 355
    frame #6: 0x00007ffff5031259 libpthread.so.0`start_thread + 233
    frame #7: 0x00007ffff528b5e3 libc.so.6`__clone + 67
* thread #26, name = 'QThread', stop reason = signal SIGSEGV: address access protected (fault address: 0x7fff900008d0)
  * frame #0: 0x00007fff900008d0
    frame #1: 0x00007fffe48b0b59 libExtUsb.so`log_v + 441
    frame #2: 0x00007fffe48b12da libExtUsb.so`usbi_log + 138
    frame #3: 0x00007fffe48b773e libExtUsb.so`usbi_wait_for_events + 78
    frame #4: 0x00007fffe48b59f1 libExtUsb.so`handle_events + 321
    frame #5: 0x00007fffe48b69d7 libExtUsb.so`libusb_handle_events_timeout_completed + 503
    frame #6: 0x00007fffe48a6f0b libExtUsb.so`QUsbEventsThread::run() + 75
    frame #7: 0x00007ffff58616b3 libQt6Core.so.6`___lldb_unnamed_symbol4394$$libQt6Core.so.6 + 355
    frame #8: 0x00007ffff5031259 libpthread.so.0`start_thread + 233
    frame #9: 0x00007ffff528b5e3 libc.so.6`__clone + 67
[...]

petrmanek avatar Jun 03 '21 09:06 petrmanek

@fpoussin Have you had any luck reproducing the issue? Can I provide any more information?

petrmanek avatar Jun 09 '21 21:06 petrmanek

No, haven't seen this behavior so far. Did you try with Qt 5.x as well?

fpoussin avatar Jun 10 '21 08:06 fpoussin

@fpoussin I have not. I will try tomorrow and let you know how it goes.

petrmanek avatar Jun 10 '21 18:06 petrmanek

Ever since #92 was merged I have not encountered any symptoms of this bug for over a year. Closing this as resolved. :tada:

petrmanek avatar Oct 17 '23 12:10 petrmanek