realm-core Realm notification listener crash

Realm version: 10.1.1 and some previous (it was reproduce on 10.0.0 too).

It happens suddenly on 20-50 users in a day (we have around 1k users in a day). I could provide only part of stack trace from firebase:

Crashed: Realm notification listener
0  Realm                          0x101671e04 long long realm::Array::get<64ul>(unsigned long) const + 4
1  Realm                          0x1013922fc realm::ArrayKeyBase<0>::get(unsigned long) const + 36
2  Realm                          0x101392324 realm::util::FunctionRef<void (realm::BPlusTreeNode*, unsigned long)>::FunctionRef<realm::BPlusTree<realm::ObjKey>::get(unsigned long) const::'lambda'(realm::BPlusTreeNode*, unsigned long)&>(realm::ObjKey&&)::'lambda'(void*, realm::BPlusTreeNode*, unsigned long)::__invoke(void*, realm::BPlusTreeNode*, unsigned long) + 28
3  Realm                          0x10173c638 realm::ConstLstIf<realm::ObjKey>::get(unsigned long) const + 264
4  Realm                          0x1013a3b84 realm::_impl::ListNotifier::run() + 244
5  Realm                          0x1013b9f8c realm::_impl::RealmCoordinator::run_async_notifiers() + 1788
6  Realm                          0x1013b9834 realm::_impl::RealmCoordinator::on_change() + 24
7  Realm                          0x10139345c realm::_impl::ExternalCommitHelper::listen() + 204
8  Realm                          0x1013938b4 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, realm::_impl::ExternalCommitHelper::ExternalCommitHelper(realm::_impl::RealmCoordinator&)::$_0> >(void*) + 52
9  libsystem_pthread.dylib        0x1df23dca8 _pthread_start + 320
10 libsystem_pthread.dylib        0x1df246788 thread_start + 8

Steps to reproduce: unknown, sorry

Nov 09 '20 11:11 Viktorianec

There's a good chance this is the same issue as described in https://github.com/realm/realm-core/pull/4175 The reason being the combination of the notifier being run and ConstLstIf<ObjKey>::get() being triggered. The know (and since fixed) issue here is that the iterator at the ConstLstIf level provides access to invalidated links which are not actually stored there, so it could be that the Array::get() asserts if this link was out of bounds at the storage level.

Dec 07 '20 18:12 ironage

@Viktorianec can you tell if your app has been running in the background at the time the crashes happened? A possible cause might be trying to access the realm file after the device is locked and iOS revokes the access to it. Because realm files are memory-mapped we unfortunately do not get useful errors from the operating system but instead hard crashes like that.

Dec 07 '20 21:12 fealebenpae

@fealebenpae I'm able to notice same kind of crash with almost identical stack trace. Is there anything that could be done to avoid them? Most of them happen when app is in background (based on instabugs), so most likely your assumption about iOS file rights is correct.

Aug 18 '21 13:08 r-rebacz

@r-rebacz Could you please add the actual stack trace you are seeing and also inform about the version of Realm you are using?

Sep 07 '21 07:09 jedelbo

Thank you @jedelbo for your interest in the topic. We're using v10.7.4.

Crashed: Realm notification listener
SIGSEGV 0x0000000116089ed0
----
Crashed: Realm notification listener
0  Realm                          0x103209124 long long realm::Array::get<64ul>(unsigned long) const + 4
1  Realm                          0x1030b2d64 realm::ArrayKeyBase<0>::get(unsigned long) const + 36
2  Realm                          0x1030b481c realm::util::FunctionRef<void (realm::BPlusTreeNode*, unsigned long)>::FunctionRef<realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const::'lambda'(realm::BPlusTreeNode*, unsigned long)&>(realm::ObjKey&&)::'lambda'(void*, realm::BPlusTreeNode*, unsigned long)::__invoke(void*, realm::BPlusTreeNode*, unsigned long) + 28
3  Realm                          0x1030b47f0 realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const + 64
4  Realm                          0x1030b5798 realm::LnkLst::get_any(unsigned long) const + 80
5  Realm                          0x10345ff30 realm::_impl::ListNotifier::run() + 260
6  Realm                          0x103468bdc realm::_impl::RealmCoordinator::run_async_notifiers() + 3124
7  Realm                          0x103467f2c realm::_impl::RealmCoordinator::on_change() + 24
8  Realm                          0x10344ec4c realm::_impl::ExternalCommitHelper::listen() + 204
9  Realm                          0x10344ede4 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, realm::_impl::ExternalCommitHelper::ExternalCommitHelper(realm::_impl::RealmCoordinator&)::$_0> >(void*) + 52
10 libsystem_pthread.dylib        0x1dc620bfc _pthread_start + 320
11 libsystem_pthread.dylib        0x1dc629758 thread_start + 8

Sep 14 '21 08:09 r-rebacz

@r-rebacz I am sorry for the long delay in responding to this. I am a bit confused about the version of Realm you are using, perhaps because I am not sure if this is happening on Android or on iOS. Anyway - we have made some fixes in this area that - however - is not yet released. I hope they will also fix the issues you experience.

Oct 26 '21 09:10 jedelbo

@jedelbo thank you for feedback. It's iOS app. I mentioned about "iOS file rights" in one of previous comment but I should probably be more clear :) Could you please point out pull requests with fixes, so I can track when they'll get released? Thank you in advance.

Nov 04 '21 12:11 r-rebacz

@r-rebacz The confusion comes from the fact that the stack trace does not match v10.7.4. It seems to be a newer version. You can follow https://github.com/realm/realm-cocoa/pull/7488.

Nov 04 '21 14:11 jedelbo

➤ Jørgen Edelbo commented:

We are waiting to see if the new release improves the situation

Nov 15 '21 10:11 sync-by-unito[bot]

We updated to 10.20.0 but we are still able to see the crash. It's an iOS app, still using the objc version of Realm.

In addition to what was already said above. This crash happens for only one RLMObject (from a total of 88 Realm objects that we have), and only when it is added to Realm (not updated/deleted). The object is owner of 3 other objects (on which 2 are RLMArrays) each of them with its own children (but the entire structure is not complicated and the size of the arrays is max 15). This particular object doesn't own any RLMEmbeddedObjects, neither its children. Bottom line is that we have other objects with a structure much more complex than this one.

The creation rate is also low, is usually 1 per user session. I don't see any issue with the object or its types and we're not doing anything fancy with it, but it's intriguing that it happens to only this object.

It happens randomly, we weren't able to catch the crash with the debugger, only from what we see in crashlytics.

This is the stack from the thread that is crashing.

0    Realm                                    0x102da1778     long long realm::Array::get<64ul>(unsigned long) const + 4
1    Realm                                    0x102c4e5c4     realm::ArrayKeyBase<0>::get(unsigned long) const (array_key.hpp:90)
2    Realm                                    0x102c507bc     realm::util::FunctionRef<void (realm::BPlusTreeNode*, unsigned long)>::FunctionRef<realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const::'lambda'(realm::BPlusTreeNode*, unsigned long)&>(realm::ObjKey&&)::'lambda'(void*, realm::BPlusTreeNode*, unsigned long)::__invoke(void*, realm::BPlusTreeNode*, unsigned long) (function_ref.hpp:103)
3    Realm                                    0x102c50790     realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const (bplustree.hpp:379)
4    Realm                                    0x102c9d7c8     realm::LnkLst::get_any(unsigned long) const (list.hpp:904)
5    Realm                                    0x102fec4fc     realm::_impl::ListNotifier::run() + 259
6    Realm                                    0x102ff5228     realm::_impl::RealmCoordinator::run_async_notifiers() + 3207
7    Realm                                    0x102ff4524     realm::_impl::RealmCoordinator::on_change() + 23
8    Realm                                    0x102fd83c4     realm::_impl::ExternalCommitHelper::listen() + 203
9    Realm                                    0x102fd84e4     void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, realm::_impl::ExternalCommitHelper::ExternalCommitHelper(realm::_impl::RealmCoordinator&)::$_0> >(void*) + 51
10   libsystem_pthread.dylib                  0x1f20bb9a4     _pthread_start + 147
11   libsystem_pthread.dylib                  0x1f20baea0     thread_start + 7

Or similar ones

0    Realm                                    0x102361700     long long realm::Array::get<16ul>(unsigned long) const + 4
1    Realm                                    0x10220e5c4     realm::ArrayKeyBase<0>::get(unsigned long) const (array_key.hpp:90)
...

And this is what's always happening on the thread that is triggering the save (not sure if this is relevant in any way).

0    libsystem_kernel.dylib                   0x1b8945f90     _psynch_cvwait + 8
1    libc++.1.dylib                           0x199ce6ddc     $std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 27
2    Realm                                    0x1025a5420     realm::_impl::NotifierPackage::package_and_wait(realm::util::Optional<unsigned long long>) + 235
3    Realm                                    0x1025c1094     realm::_impl::transaction::begin(std::__1::shared_ptr<realm::Transaction> const&, realm::BindingContext*, realm::_impl::NotifierPackage&) + 1363
4    Realm                                    0x1025b6cac     realm::_impl::RealmCoordinator::promote_to_write(realm::Realm&) + 291
5    Realm                                    0x102644dd0     realm::Realm::begin_transaction() + 147
6    Realm                                    0x10232ef04     -[RLMRealm beginWriteTransactionWithError:] (RLMRealm.mm:644)
7    Realm                                    0x10232f204     -[RLMRealm transactionWithoutNotifying:block:error:] (RLMRealm.mm:692)
8    Realm                                    0x10232f194     -[RLMRealm transactionWithBlock:error:] (RLMRealm.mm:684)
...

It's a bit frustrating as it happens to quite some users per day and it completely brakes their app experience.

Jan 07 '22 16:01 bodnar-dan

@tgoyne Based on your experience with notifiers, does this ring a bell? This seems to be happening when an object containing a list is created. Could it be that the list object is not properly transferred to the notification transaction?

Jan 10 '22 14:01 jedelbo

The crashing line here is https://github.com/realm/realm-core/blob/master/src/realm/object-store/impl/list_notifier.cpp#L100

We check if the List is valid at the start of that function (!m_list || !m_list->is_attached()), and any sort of bug in the handover process should result in it just taking the list-was-deleted code path. There's also a call to size() before this which had to have return a non-zero value to hit this location.

Jan 10 '22 19:01 tgoyne

Are the apps that are crashing using Realm sync? I'm just realizing that the indices reported to the replication for notifications are the full set including unresolved links, and this section of code is using LnkLst which would then translate an index incorrectly to something out of bounds.

Jan 10 '22 19:01 ironage

Oh, notification bugs related to unresolved links would be pretty unsurprising, and also would explain why it's only happening on one object (if that object is just the only one with a list with an unresolved link). This hopefully just requires fixing the index in Replication::list_set() etc. then? I assume we need to pass the raw index to sync replication so we can't adjust it earlier.

Jan 10 '22 20:01 tgoyne

It should be noted that the client cannot insert unresolved links,so this is probably not the problem here. But I agree that there are problems in handling replication of unresolved links. Created #5164.

Jan 12 '22 10:01 jedelbo

Are the apps that are crashing using Realm sync?

No. We're using Realm just for persisting data locally.

Jan 14 '22 15:01 bodnar-dan

I see the issue #5164 was merged, but didn't made it to 11.9.0. Is there a workaround for this or double check that we can do? It's getting critical for us, as we are facing repetitive crashes for some users. Internally, we were still unable to reproduce the crash, with or without the debugger attached.

We have simplified as much as we could the object structure, we removed properties that were not important. It has now only one RLMArray property (with a max size of 15 RLMObjects) and we have an NSData property (which is fairly small, around 5k bytes). The crash still happens.

Feb 21 '22 16:02 bodnar-dan

If you are not using sync, #5164 is not relevant for you. We will try to find out what we can do to find the root cause of this problem.

Feb 22 '22 16:02 jedelbo

I saw in Realm documents and in this thread https://github.com/realm/realm-swift/issues/7164 that it's recommendable to use GCD rather than Threads for doing background work. We do have one Thread that is doing a specific task whenever an object that is causing the crash is added to Realm. We use this approach: https://academy.realm.io/posts/realm-notifications-on-background-threads-with-swift/ to add the notification block to a RLMResults on a the background thread (we also lower the threadPriority to 0.2 and the qualityOfService = .utility). Could this be a possible cause of the issue?

Feb 28 '22 08:02 bodnar-dan

@tgoyne can you comment on the above.

Feb 28 '22 12:02 jedelbo

There isn't any obvious reason why that would cause problems.

Mar 01 '22 16:03 tgoyne

I can confirm the background thread execution is not at fault. We flagged out the background task and the crash is still happening. We really need a bit of help understanding better the crash, because I think the key here is that this is happening only on one object. Could this happen because of a poorly implemented notification block? Or the crash happens before those are even called? We added more logs in our latest build, but it doesn't seem the notification blocks are getting called. Could maybe a Realm migration on the client can cause this? Maybe something we didn't handled properly? We did saw an increase on the crash rate when a new feature was released. This feature was adding a bunch of new properties (RLMObjects) in the owner of the object that is crashing when added. Could this be an issue? Maybe the owner has too many child objects? The owner it's the account main object, so basically it's the owner of anything the user does in the app. But it doesn't explain why the crash is happening only when one child is added.

Mar 04 '22 10:03 bodnar-dan

One thing I saw in the latest release, there are certain situations when we have a different stack trace. In this scenario the crash happens inside a notification block, when updating the same object. Specifically, when we iterate the modifications and create a list with the changed objects (we are aware that the modifications reflect the changes in the old results and we do map it to the new results). So in this case, the object was saved, but crashed when a field was updated on it.

Crashed: com.apple.main-thread
EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000122142ff8
0  Realm                          0x154d68 long long realm::Array::get<64ul>(unsigned long) const + 4
1  Realm                          0xdb4c realm::ArrayKeyBase<0>::get(unsigned long) const + 36
2  Realm                          0xfd44 realm::util::FunctionRef<void (realm::BPlusTreeNode*, unsigned long)>::FunctionRef<realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const::'lambda'(realm::BPlusTreeNode*, unsigned long)&>(realm::ObjKey&&)::'lambda'(void*, realm::BPlusTreeNode*, unsigned long)::__invoke(void*, realm::BPlusTreeNode*, unsigned long) + 28
3  Realm                          0xfd18 realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const + 64
4  Realm                          0x60e24 realm::LnkLst::get_object(unsigned long) const + 32
5  Realm                          0x61128 realm::ObjList::try_get_object(unsigned long) const + 100
6  Realm                          0x23fe38 realm::Query::do_find_all(realm::TableView&, unsigned long) const + 156
7  Realm                          0x2cee24 realm::TableView::do_sync() + 652
8  Realm                          0x3f0c98 realm::Results::ensure_up_to_date(realm::Results::EvaluateMode) + 456
9  Realm                          0x3f0590 realm::util::Optional<realm::Obj> realm::Results::try_get<realm::Obj>(unsigned long) + 48
10 Realm                          0x3f0498 realm::Obj realm::Results::get<realm::Obj>(unsigned long) + 80
11 Realm                          0x2ced8 RLMAccessorContext realm::Results::dispatch<auto realm::Results::get<RLMAccessorContext>(RLMAccessorContext&, unsigned long)::'lambda'(RLMAccessorContext&)>(RLMAccessorContext&) const + 380
12 Realm                          0x28cb8 auto realm::Results::get<RLMAccessorContext>(RLMAccessorContext&, unsigned long) + 36
13 Realm                          0x12b3b8 -[RLMResults objectAtIndex:] + 52

Mar 06 '22 14:03 bodnar-dan

➤ Jørgen Edelbo commented:

We cannot find an explanation to why you see these crashes so there is clearly something we don't know about your use case. Without a minimal reproduction case, I don't think there is a way we can proceed with this.

Jun 14 '22 11:06 sync-by-unito[bot]

Hello, this is a quite old thread. Do we have some way to reproduce this? Do we know if there was a migration that could have changed things? We some vague idea of how this could have happened, I could try to reproduce it in my environment. @Bodnar-Dan ...

Sep 06 '22 12:09 nicola-cab

➤ Nicola Cabiddu commented:

Closing this issue, because it is more than 1y old, and we have no clear way to reproduce it. It seems a migration could have been responsible for it, but without any further information, there is no way for us to tackle and fix the problem.

Sep 07 '22 09:09 sync-by-unito[bot]

realm-core realm-core copied to clipboard

Realm notification listener crash

realm-core
realm-core copied to clipboard