damus icon indicating copy to clipboard operation
damus copied to clipboard

Ndb ingester queue `pthread_mutex_lock` crash related to switching in/out of the app

Open danieldaquino opened this issue 1 year ago • 6 comments

Information so far:

Affected preconditions

Damus: 1.10, 1.9 and others Devices: iPhones and iPads iOS: 16.x, 17.x, 18.x

How to reproduce

Still unknown. Crash report comments seem to indicate that there is a correlation between switching Damus between foreground/background.

Other technical info

  • Crash stack traces available in Xcode Organizer. See entries under damus: threadpool_dispatch + 48

danieldaquino avatar Sep 13 '24 19:09 danieldaquino

From the macOS 14.6 Library Functions manual ($ man pthread_mutex_lock)

ERRORS
     The pthread_mutex_lock() function will fail if:

     [EINVAL]           The value specified by mutex is invalid.

     [EDEADLK]          A deadlock would occur if the thread blocked waiting for mutex.

It is still unclear which one of these we are falling into, as I haven't been able to repro. I am setting some instrumentation to try and get the error code if I can repro.

danieldaquino avatar Sep 13 '24 19:09 danieldaquino

I couldn't repro this yet

Repro attempt 1

Result: No repro

Device: iPhone 13 mini iOS: 17.6.1 Damus: 1.10 3902fe7b30f38ec104c13087948799e38e26fa91, damus build scheme with the following local patch

diff --git a/nostrdb/protected_queue.h b/nostrdb/protected_queue.h
index c2212b69..d4240709 100644
--- a/nostrdb/protected_queue.h
+++ b/nostrdb/protected_queue.h
@@ -88,7 +88,11 @@ static int prot_queue_push(struct prot_queue* q, void *data)
 {
 	int cap;
 
-	pthread_mutex_lock(&q->mutex);
+    int error_code = pthread_mutex_lock(&q->mutex);
+    
+    if (error_code != 0) {
+        printf("pthread_mutex_lock error code: %d", error_code);
+    }
 
 	cap = prot_queue_capacity(q);
 	if (q->count == cap) {

Steps:

  1. Get a bunch of damus.io links to specific notes that are likely not already in NostrDB (to induce note ingestion) and put them on a note
  2. Switch back and forth between Damus and the Notes app, click on those damus.io links each time.

@jb55, @alltheseas, do you have any tips on how to repro this?

danieldaquino avatar Sep 13 '24 19:09 danieldaquino

I havent been able to repro successfully. I just submit reports upon crash

alltheseas avatar Sep 13 '24 20:09 alltheseas

https://damus.io/nevent1qqspnktwehgl744j66xw48jmsa0drwmfrd3y5l8ap2m4tptzdy9y5uspzpmhxue69uhnzdps9enrw73wd9hsz8rhwden5te0d36kucmgvfhhstnnv9hxgamfvd5zuenpwfksz9nhwden5te0wfjkccte9ehx7uewwdhkx6tpdsq3yamnwvaz7tmhda6zuat50phjummwv5lhhhyt

alltheseas avatar Sep 13 '24 20:09 alltheseas

Just happened to me again. I was copying share link (damus.io/nevent) for external sharing a few times in a row.

Could not consistently recreate.

alltheseas avatar Sep 18 '24 01:09 alltheseas

I intermittenly receive damus crashes as I switch out of the app on 1.10.1. Still dont know exact steps to reproduce.

alltheseas avatar Sep 24 '24 17:09 alltheseas

From a conversation with @alltheseas:

(...) it may be related to the sudden reconnection of several relays.

The stack trace indicates the crash happens on the ndb ingester, while trying to lock a mutex. Maybe the sudden reconnection causes a sudden surge in notes being ingested, and that surge can cause issues with our mutex/multithreading logic?

Perhaps the incidence of this crash is positively correlated to the volume of incoming notes? (which in turn is loosely correlated to the amount of relays and contact list size)

Notes for reproduction of this issue: Try performing the test while logged into @alltheseas's npub.

danieldaquino avatar Oct 18 '24 18:10 danieldaquino

added to 1.11 milestone and current sprint

alltheseas avatar Nov 11 '24 17:11 alltheseas