damus
damus copied to clipboard
Bug: RUNNINGBOARD 0xdead10cc crash
Crash name in organizer: RunningBoardServices: [RBSConnection _connection] + 88
(You can find detailed crash logs by looking up the name above)
Versions Based on data from the past 2 weeks: Damus version: 1.10.x and 1.11.x Operating system version: Seems to affect iOS 17.x and 18.x Device: iOS devices
Steps To Reproduce TBD
Additional context
This piece is interesting:
Exception Type: EXC_CRASH (SIGKILL)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Termination Reason: RUNNINGBOARD 0xdead10cc
Triggered by Thread: 0
From Apple's documentation https://developer.apple.com/documentation/xcode/sigkill:
0xdead10cc (3735883980) — pronounced “dead lock” The operating system terminated the app because it held on to a file lock or SQLite database lock during suspension. Request additional background execution time on the main thread with beginBackgroundTask(withName:expirationHandler:). Make this request well before starting to write to the file in order to complete those operations and relinquish the lock before the app suspends. In an app extension, use beginActivity(options:reason:) to manage this work.
is this the swap damus to background bug?
is this the swap damus to background bug?
I think the two might be related, but not exactly the same crash
They both seem to be related to NostrDB and foreground/background states, but the characteristics and signature of the crash seem different. The other crash happens on the ingester thread while trying to lock a mutex, this one is caused by RunningBoard (which I think is a component of iOS that manages apps) that kills our app when it finds out we have a file lock still active after the user leaves the app.
Check if this is happening on extension
Is this the swapping app crash, or another one @danieldaquino ?
Check if this is happening on extension
It is also happening on extension.
I ran one of the runningboard crashes via chatgpt. Here are the findings:
chatgpt runningboard crash output
What the log says
Exception / signal: EXC_CRASH (SIGKILL)
Termination: namespace: RUNNINGBOARD, code: 3735883980 which is 0xDEAD10CC → Apple’s watchdog code. → Translation: iOS decided the app wasn’t responding fast enough (e.g., hung the main thread, or took too long during a lifecycle transition like launch/backgrounding).
OS / device: iOS 18.0 (26.0, build 23A341) on iPhone 13 mini (iPhone14,4).
App session: launched at 09:02:06, killed at 09:04:25 (~139s runtime).
Main thread backtrace: sitting in the run loop (mach_msg → CFRunLoopRun → UIApplicationMain), which is typical when the watchdog fires; you won’t see a crashing frame because the system terminated the process.
What typically triggers 0xDEAD10CC
Main-thread stalls: long sync work (JSON parsing, disk I/O, crypto, large SwiftUI layout passes) blocking user events or lifecycle callbacks.
Lifecycle timeouts: doing heavy work in sceneDidBecomeActive/WillResignActive, applicationDidEnterBackground, or not ending a background task in time.
Deadlocks: waiting on a serial queue or lock from the main thread.
Runaway UI work: very large SwiftUI view recompositions or infinite state update loops.
Clues in this report
No out-of-memory (no jetsam namespace), so it’s not memory pressure.
It died after ~2 minutes, which often aligns with backgrounding/suspension misbehavior or a long-running main-thread task started after launch (not an immediate launch hang).
What to check / fix (actionable)
Reproduce with timestamps
Note what you did ~10–30s before the kill (backgrounding? returning to foreground? opening a heavy screen?). Try repeating that sequence.
Instrument main-thread responsiveness
Add os_signpost and Logger around lifecycle callbacks and heavy actions (open screen, start network/disk work).
Turn on Main Thread Checker and Performance instruments (Time Profiler + Points of Interest) and watch for >200–500ms blocks on the main thread.
Audit background tasks
Search for beginBackgroundTask and ensure every path calls endBackgroundTask.
Avoid starting heavy work inside applicationDidEnterBackground/sceneWillResignActive; offload to a background queue and respect the expiration handler.
Move heavy work off the main thread
JSON decode, signature/crypto, database reads, and image decoding: do them on background queues.
For SwiftUI, break large views into smaller Equatable subviews and isolate state so small changes don’t trigger full recompositions.
Look for lock inversions / deadlocks
Grep for .sync on dispatch queues used from the main thread, or any locks taken on the main thread that might also be awaited elsewhere.
Collect system diagnostics
On the device right after reproducing: capture a sysdiagnose (will contain the watchdog reason and recent logs).
Check Xcode Organizer → Crashes and MetricKit in-app (MXAppHangDiagnostic/MXCrashDiagnostic) for “watchdog” or “app hang” entries and their callouts.
iOS 18 specifics
If you recently adopted APIs or privacy changes that moved work into lifecycle hooks, re-check timing. Some limits are stricter on foreground responsiveness.
If you want me to narrow it down further
Send one of:
The device syslog (around the termination time), or
An Instruments trace (Time Profiler for ~2–3 minutes including background/foreground), or
Any MetricKit JSON payload for this incident.
With any of those, I can usually pinpoint the exact stall (e.g., a specific function, queue, or SwiftUI view causing long recompositions).
LLM debugging mentions a potential root cause in
Runaway UI work: very large SwiftUI view recompositions or infinite state update loops.
Therefore, this might be a motivation to investigate #3246
runninboard error findings
• Key Findings
- damus/ContentView.swift:510 only queues damus_state.ndb.close() on a Task, with no UIApplication.beginBackgroundTask guard and while still on the main actor. nostrdb/Ndb.swift:208 shows
close() calls ndb_destroy, which joins the ingester threadpool and writer before mdb_env_close releases lock.mdb. On devices that work the socket hard this routinely exceeds the ≈200 ms
suspension window, so iOS suspends us with the LMDB file lock still held — exactly the RUNNINGBOARD 0xdead10cc reason from the crash report. Wrap the teardown in a background task (or
otherwise block synchronously until close() returns) and end the task explicitly once the environment is down, so the OS doesn’t freeze the process mid-shutdown.
- In the same background path we leave the relay pool alive. RelayPool.add_relay (damus/Core/Nostr/RelayPool.swift:122-135) keeps dispatching websocket frames straight into ndb.process_event.
When .background fires, those closures can still arrive while the shutdown is trying to tear down LMDB, keeping the worker threads busy and extending the lock lifetime. Instead of closing
just the database, call damus_state.close() (damus/Core/Storage/DamusState.swift:164) or at least nostrNetwork.pool.close() before closing ndb, so the sockets are drained and the writer/
ingester stop before LMDB teardown begins.
- The notification service extension follows the same pattern: NotificationService.didReceive (DamusNotificationService/NotificationService.swift:31-118) instantiates
NotificationExtensionState (which opens LMDB) and then hands it to an async Task. If the extension is culled while that task is still running (common when formatting hits the time limit)
the process dies with the LMDB lock still open, matching the “also happening on extension” comment. Add a defer { state.ndb.close() } right after creating the state (and mirror it in
serviceExtensionTimeWillExpire), or wrap the async work in withTaskCancellationHandler that closes the database so the extension never exits with an outstanding LMDB writer.
Open questions: Do we have any instrumentation on actual close durations or threadpool drain times? Capturing a sysdiagnose after reproducing would confirm whether the kill happens
immediately post–scenePhase change or during extension shutdown.
Natural next steps: 1) add the background-task wrapped teardown (and pooled shutdown) in the main app, 2) apply the same explicit close to the notification/share extensions, 3) instrument
close duration with os_signpost so you can verify on-device that LMDB is released before suspension.