[bug] Deadlock when sending events with tracing enabled (v2 beta)
Describe the bug
[ moving this here from Discord ]
I'm getting a deadlock when I try to resize the window (or re-focus it, or any other action that sends a window event from the main thread to itself).
See the video for some more details, but this is the gist:
It seems like the deadlock is caused when one of my (not main) threads runs a window.emit('whatever', ...), which acquires a lock on the webview and holds it until the script has been run (I think - not 100% sure on what it waits for). Then when I resize the window, that tries to acquire a lock on the webview too, which is already locked by the emitted event. Then it just hangs forever because it can't get the lock until the main thread runs the emitted JS thing, which will never happen because it's paused waiting for the lock 🙃 That's my theory anyway.
I noticed it seemed to start happening when I added the tauri-devtools plugin. With Fabian's suggestion in Discord, I was able to create a minimal reproduction with just the tauri tracing feature enabled. I can't get it to happen when tracing is not enabled.
Fabian thought this might be related: https://github.com/tauri-apps/tauri/pull/9429
https://github.com/tauri-apps/tauri/assets/505704/4f4f5b9d-ad1d-4dce-a030-93c341e90798
Reproduction
https://github.com/dceddia/tauri-tracing-freeze
Expected behavior
Ideally no deadlock 😄
Full tauri info output
This is from the minimal repro:
Projects/tauri-tracing-freeze % pnpm tauri info
> [email protected] tauri /Users/dceddia/Projects/tauri-tracing-freeze
> tauri "info"
[✔] Environment
- OS: Mac OS 14.2.1 X64
✔ Xcode Command Line Tools: installed
✔ rustc: 1.77.1 (7cf61ebde 2024-03-27)
✔ cargo: 1.77.1 (e52e36006 2024-03-26)
✔ rustup: 1.27.0 (bbb9276d2 2024-03-08)
✔ Rust toolchain: stable-aarch64-apple-darwin (default)
- node: 20.8.0
- pnpm: 8.11.0
- yarn: 1.22.21
- npm: 10.1.0
[-] Packages
- tauri [RUST]: 2.0.0-beta.14
- tauri-build [RUST]: 2.0.0-beta.11
- wry [RUST]: 0.39.0
- tao [RUST]: 0.27.0
- tauri-cli [RUST]: 1.4.0
- @tauri-apps/api [NPM]: 2.0.0-beta.7
- @tauri-apps/cli [NPM]: 2.0.0-beta.12
[-] App
- build-type: bundle
- CSP: unset
- frontendDist: ../dist
- devUrl: http://localhost:1420/
- framework: Svelte
- bundler: Vite
Stack trace
There's no crash, but check out the video for some relevant stack traces while execution was paused.
Additional context
No response
Minimal reproduction: https://github.com/Ovenoboyo/tauri-tracing-deadlock
^ The code spawns a thread which loops and keeps emitting events to the webview The webview on receiving the event calls an invoke command
With the tracing feature active, the main thread is stuck on acquiring the lock since the thread with tracing is still waiting for a message from the webview.
I'm experiencing something very similar when resizing while events are being emitted at a high frequency (around once every 10 ms), the app deadlocks. I'm able to work around it by using a channel instead.
I have similar symptoms just from emitting on app within a non-async command. Running an older version of Tauri, though. Don't have tracing enabled explicitly. only via the crabnebula devtools plugin.
I encountered the same issue, after adding CrabNebula devtools to my app, that turned on the tracing. Then it deadlock when creating a new window and sending it event. Removing CrabNebula devtools fix the deadlock.
I'm seeing this as well in version 2.5.1 I don't have the devtools plugin installed but I believe tracing was enabled.
I'm on OSX and the offending stack traces are:
semaphore_wait_trap 0x000000018a7c7bb0
_dispatch_sema4_wait 0x000000018a653960
_dispatch_semaphore_wait_slow 0x000000018a653f10
[Inlined) std::sys::sync::thread_parking::darwin::Parker::park darwin.rs:74
std::thread::Thread::park mod.rs:1446
std::sync::mpmc::context::Context::wait_until context.rs:143
std::sync::mpmc.:list::Channel<T>::recv::{(closure)) list.rs:448
[Inlined) std::sync::mpme::context::Context::with::{(closure}} context.rs:49
std::sync::mpmc.:context::Context::with::{{closure)} context.rs:57
std::thread::local:LocalKey<T>::try_with local.rs:310
std::sync::mpmc.:context::Context::with context.rs:52
std::sync::mpme:list::Channel<T>:recv list.rs:437
std:sync:mpmc::Receiver<T>::recv mod.rs:984
std::syne::mpsc.Receiver<1>:recv mpsc.rs:845
< tauri_runtime_wry::WryWebviewDispatcher<T> as tauri_runtime::WebviewDispatch<T>>:eval_script lib.rs:185
tauri::webview::Webview < R> ::eval mod.rs: 1648
tauri::webview::Webview < R>emit_js mod.rs:1695
taurixevent:listener::Listeners.emit_js_filter::{(closure)) listener.rs:289
corewiter::traits.iterator.lterator:try_for_each.call:f(closure)) iterator.rs:2428
core:iter:traits:iterator:Iterator:try_fold iterator.rs:2370
core:iter:traits:iterator:Iterator:try_for_each iterator.rs:2431
tauri:event:listener::Listeners..emit_js_filter listener.rs:282
tauri:.event:listener.:Listeners..emit_js listener.rs:301
tauri:manager::AppManager<R>emit mod.rs:559
tauri:Emitter::emit lib.rs:956
and
_psynch_mutexwait 0x000000018a7ca89c
_pthread_mutex_firstfit_lock_wait 0x000000018a806e58
_pthread_mutex_firstfit_lock_slow 0x000000018a804840
std::sys::pal::unix::sync::mutex::Mutex::lock mutex.rs:72
(Inlined] std::sys::sync::mutex::pthread::Mutex::lock pthread.rs:34
std::sync::poison::mutex::Mutex<T>::lock mutex.rs:437
tauri::manager::webview::WebviewManager<R>::webviews_lock webview.rs:113
tauri::manager::AppManager<R>::get_webview mod.rs:683
tauri::Manager:get_webview_window lib.rs:585
(from the main thread)
With these we can see how the deadlock can occur. Here is the exact scenario I ran into but anytime we have tracing, are waiting on the main thread and are dependent on webviews we could deadlock.
Emit an event while simultaneously attempting to focus a window by handling a tray icon click.
- Thread A: locks the webviews mutex to emit an event
crates/tauri/src/manager/mod.rs:559 - Thread A: sends a message to the main thread
crates/tauri-runtime-wry/src/lib.rs:1694 - Thread A: waits for the main thread to respond
- The main thread processes the tray icon click event which calls
get_webview_window - The main thread attempts to get the webviews mutex which has already been locked.
- The main thread never responds to Thread A
After a few user reports of the app freezing, we managed to reproduce it locally, with tauri 2.8.5 and tracing enabled. I suspect it's the same root cause described in this issue, because it's triggered by fast resizing/focusing of the window. When the app freezes we have two events that never resolve 👇 one being the window-resize/focus global event and the other a custom event emitted from our app.