realm-core
realm-core copied to clipboard
Realm notification listener crash
Realm version: 10.1.1 and some previous (it was reproduce on 10.0.0 too).
It happens suddenly on 20-50 users in a day (we have around 1k users in a day). I could provide only part of stack trace from firebase:
Crashed: Realm notification listener
0 Realm 0x101671e04 long long realm::Array::get<64ul>(unsigned long) const + 4
1 Realm 0x1013922fc realm::ArrayKeyBase<0>::get(unsigned long) const + 36
2 Realm 0x101392324 realm::util::FunctionRef<void (realm::BPlusTreeNode*, unsigned long)>::FunctionRef<realm::BPlusTree<realm::ObjKey>::get(unsigned long) const::'lambda'(realm::BPlusTreeNode*, unsigned long)&>(realm::ObjKey&&)::'lambda'(void*, realm::BPlusTreeNode*, unsigned long)::__invoke(void*, realm::BPlusTreeNode*, unsigned long) + 28
3 Realm 0x10173c638 realm::ConstLstIf<realm::ObjKey>::get(unsigned long) const + 264
4 Realm 0x1013a3b84 realm::_impl::ListNotifier::run() + 244
5 Realm 0x1013b9f8c realm::_impl::RealmCoordinator::run_async_notifiers() + 1788
6 Realm 0x1013b9834 realm::_impl::RealmCoordinator::on_change() + 24
7 Realm 0x10139345c realm::_impl::ExternalCommitHelper::listen() + 204
8 Realm 0x1013938b4 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, realm::_impl::ExternalCommitHelper::ExternalCommitHelper(realm::_impl::RealmCoordinator&)::$_0> >(void*) + 52
9 libsystem_pthread.dylib 0x1df23dca8 _pthread_start + 320
10 libsystem_pthread.dylib 0x1df246788 thread_start + 8
Steps to reproduce: unknown, sorry
There's a good chance this is the same issue as described in https://github.com/realm/realm-core/pull/4175
The reason being the combination of the notifier being run and ConstLstIf<ObjKey>::get()
being triggered. The know (and since fixed) issue here is that the iterator at the ConstLstIf
level provides access to invalidated links which are not actually stored there, so it could be that the Array::get()
asserts if this link was out of bounds at the storage level.
@Viktorianec can you tell if your app has been running in the background at the time the crashes happened? A possible cause might be trying to access the realm file after the device is locked and iOS revokes the access to it. Because realm files are memory-mapped we unfortunately do not get useful errors from the operating system but instead hard crashes like that.
@fealebenpae I'm able to notice same kind of crash with almost identical stack trace. Is there anything that could be done to avoid them? Most of them happen when app is in background (based on instabugs), so most likely your assumption about iOS file rights is correct.
@r-rebacz Could you please add the actual stack trace you are seeing and also inform about the version of Realm you are using?
Thank you @jedelbo for your interest in the topic. We're using v10.7.4.
Crashed: Realm notification listener
SIGSEGV 0x0000000116089ed0
----
Crashed: Realm notification listener
0 Realm 0x103209124 long long realm::Array::get<64ul>(unsigned long) const + 4
1 Realm 0x1030b2d64 realm::ArrayKeyBase<0>::get(unsigned long) const + 36
2 Realm 0x1030b481c realm::util::FunctionRef<void (realm::BPlusTreeNode*, unsigned long)>::FunctionRef<realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const::'lambda'(realm::BPlusTreeNode*, unsigned long)&>(realm::ObjKey&&)::'lambda'(void*, realm::BPlusTreeNode*, unsigned long)::__invoke(void*, realm::BPlusTreeNode*, unsigned long) + 28
3 Realm 0x1030b47f0 realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const + 64
4 Realm 0x1030b5798 realm::LnkLst::get_any(unsigned long) const + 80
5 Realm 0x10345ff30 realm::_impl::ListNotifier::run() + 260
6 Realm 0x103468bdc realm::_impl::RealmCoordinator::run_async_notifiers() + 3124
7 Realm 0x103467f2c realm::_impl::RealmCoordinator::on_change() + 24
8 Realm 0x10344ec4c realm::_impl::ExternalCommitHelper::listen() + 204
9 Realm 0x10344ede4 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, realm::_impl::ExternalCommitHelper::ExternalCommitHelper(realm::_impl::RealmCoordinator&)::$_0> >(void*) + 52
10 libsystem_pthread.dylib 0x1dc620bfc _pthread_start + 320
11 libsystem_pthread.dylib 0x1dc629758 thread_start + 8
@r-rebacz I am sorry for the long delay in responding to this. I am a bit confused about the version of Realm you are using, perhaps because I am not sure if this is happening on Android or on iOS. Anyway - we have made some fixes in this area that - however - is not yet released. I hope they will also fix the issues you experience.
@jedelbo thank you for feedback. It's iOS app. I mentioned about "iOS file rights" in one of previous comment but I should probably be more clear :) Could you please point out pull requests with fixes, so I can track when they'll get released? Thank you in advance.
@r-rebacz The confusion comes from the fact that the stack trace does not match v10.7.4. It seems to be a newer version. You can follow https://github.com/realm/realm-cocoa/pull/7488.
➤ Jørgen Edelbo commented:
We are waiting to see if the new release improves the situation
We updated to 10.20.0 but we are still able to see the crash. It's an iOS app, still using the objc version of Realm.
In addition to what was already said above. This crash happens for only one RLMObject (from a total of 88 Realm objects that we have), and only when it is added to Realm (not updated/deleted). The object is owner of 3 other objects (on which 2 are RLMArrays) each of them with its own children (but the entire structure is not complicated and the size of the arrays is max 15). This particular object doesn't own any RLMEmbeddedObjects, neither its children. Bottom line is that we have other objects with a structure much more complex than this one.
The creation rate is also low, is usually 1 per user session. I don't see any issue with the object or its types and we're not doing anything fancy with it, but it's intriguing that it happens to only this object.
It happens randomly, we weren't able to catch the crash with the debugger, only from what we see in crashlytics.
This is the stack from the thread that is crashing.
0 Realm 0x102da1778 long long realm::Array::get<64ul>(unsigned long) const + 4
1 Realm 0x102c4e5c4 realm::ArrayKeyBase<0>::get(unsigned long) const (array_key.hpp:90)
2 Realm 0x102c507bc realm::util::FunctionRef<void (realm::BPlusTreeNode*, unsigned long)>::FunctionRef<realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const::'lambda'(realm::BPlusTreeNode*, unsigned long)&>(realm::ObjKey&&)::'lambda'(void*, realm::BPlusTreeNode*, unsigned long)::__invoke(void*, realm::BPlusTreeNode*, unsigned long) (function_ref.hpp:103)
3 Realm 0x102c50790 realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const (bplustree.hpp:379)
4 Realm 0x102c9d7c8 realm::LnkLst::get_any(unsigned long) const (list.hpp:904)
5 Realm 0x102fec4fc realm::_impl::ListNotifier::run() + 259
6 Realm 0x102ff5228 realm::_impl::RealmCoordinator::run_async_notifiers() + 3207
7 Realm 0x102ff4524 realm::_impl::RealmCoordinator::on_change() + 23
8 Realm 0x102fd83c4 realm::_impl::ExternalCommitHelper::listen() + 203
9 Realm 0x102fd84e4 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, realm::_impl::ExternalCommitHelper::ExternalCommitHelper(realm::_impl::RealmCoordinator&)::$_0> >(void*) + 51
10 libsystem_pthread.dylib 0x1f20bb9a4 _pthread_start + 147
11 libsystem_pthread.dylib 0x1f20baea0 thread_start + 7
Or similar ones
0 Realm 0x102361700 long long realm::Array::get<16ul>(unsigned long) const + 4
1 Realm 0x10220e5c4 realm::ArrayKeyBase<0>::get(unsigned long) const (array_key.hpp:90)
...
And this is what's always happening on the thread that is triggering the save (not sure if this is relevant in any way).
0 libsystem_kernel.dylib 0x1b8945f90 _psynch_cvwait + 8
1 libc++.1.dylib 0x199ce6ddc $std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 27
2 Realm 0x1025a5420 realm::_impl::NotifierPackage::package_and_wait(realm::util::Optional<unsigned long long>) + 235
3 Realm 0x1025c1094 realm::_impl::transaction::begin(std::__1::shared_ptr<realm::Transaction> const&, realm::BindingContext*, realm::_impl::NotifierPackage&) + 1363
4 Realm 0x1025b6cac realm::_impl::RealmCoordinator::promote_to_write(realm::Realm&) + 291
5 Realm 0x102644dd0 realm::Realm::begin_transaction() + 147
6 Realm 0x10232ef04 -[RLMRealm beginWriteTransactionWithError:] (RLMRealm.mm:644)
7 Realm 0x10232f204 -[RLMRealm transactionWithoutNotifying:block:error:] (RLMRealm.mm:692)
8 Realm 0x10232f194 -[RLMRealm transactionWithBlock:error:] (RLMRealm.mm:684)
...
It's a bit frustrating as it happens to quite some users per day and it completely brakes their app experience.
@tgoyne Based on your experience with notifiers, does this ring a bell? This seems to be happening when an object containing a list is created. Could it be that the list object is not properly transferred to the notification transaction?
The crashing line here is https://github.com/realm/realm-core/blob/master/src/realm/object-store/impl/list_notifier.cpp#L100
We check if the List is valid at the start of that function (!m_list || !m_list->is_attached()
), and any sort of bug in the handover process should result in it just taking the list-was-deleted code path. There's also a call to size()
before this which had to have return a non-zero value to hit this location.
Are the apps that are crashing using Realm sync? I'm just realizing that the indices reported to the replication for notifications are the full set including unresolved links, and this section of code is using LnkLst which would then translate an index incorrectly to something out of bounds.
Oh, notification bugs related to unresolved links would be pretty unsurprising, and also would explain why it's only happening on one object (if that object is just the only one with a list with an unresolved link). This hopefully just requires fixing the index in Replication::list_set()
etc. then? I assume we need to pass the raw index to sync replication so we can't adjust it earlier.
It should be noted that the client cannot insert unresolved links,so this is probably not the problem here. But I agree that there are problems in handling replication of unresolved links. Created #5164.
Are the apps that are crashing using Realm sync?
No. We're using Realm just for persisting data locally.
I see the issue #5164 was merged, but didn't made it to 11.9.0. Is there a workaround for this or double check that we can do? It's getting critical for us, as we are facing repetitive crashes for some users. Internally, we were still unable to reproduce the crash, with or without the debugger attached.
We have simplified as much as we could the object structure, we removed properties that were not important. It has now only one RLMArray
property (with a max size of 15 RLMObjects) and we have an NSData property (which is fairly small, around 5k bytes). The crash still happens.
If you are not using sync, #5164 is not relevant for you. We will try to find out what we can do to find the root cause of this problem.
I saw in Realm documents and in this thread https://github.com/realm/realm-swift/issues/7164 that it's recommendable to use GCD rather than Threads for doing background work.
We do have one Thread that is doing a specific task whenever an object that is causing the crash is added to Realm. We use this approach: https://academy.realm.io/posts/realm-notifications-on-background-threads-with-swift/ to add the notification block to a RLMResults on a the background thread (we also lower the threadPriority
to 0.2 and the qualityOfService = .utility
).
Could this be a possible cause of the issue?
@tgoyne can you comment on the above.
There isn't any obvious reason why that would cause problems.
I can confirm the background thread execution is not at fault. We flagged out the background task and the crash is still happening. We really need a bit of help understanding better the crash, because I think the key here is that this is happening only on one object. Could this happen because of a poorly implemented notification block? Or the crash happens before those are even called? We added more logs in our latest build, but it doesn't seem the notification blocks are getting called. Could maybe a Realm migration on the client can cause this? Maybe something we didn't handled properly? We did saw an increase on the crash rate when a new feature was released. This feature was adding a bunch of new properties (RLMObjects) in the owner of the object that is crashing when added. Could this be an issue? Maybe the owner has too many child objects? The owner it's the account main object, so basically it's the owner of anything the user does in the app. But it doesn't explain why the crash is happening only when one child is added.
One thing I saw in the latest release, there are certain situations when we have a different stack trace. In this scenario the crash happens inside a notification block, when updating the same object. Specifically, when we iterate the modifications and create a list with the changed objects (we are aware that the modifications reflect the changes in the old results and we do map it to the new results). So in this case, the object was saved, but crashed when a field was updated on it.
Crashed: com.apple.main-thread
EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000122142ff8
0 Realm 0x154d68 long long realm::Array::get<64ul>(unsigned long) const + 4
1 Realm 0xdb4c realm::ArrayKeyBase<0>::get(unsigned long) const + 36
2 Realm 0xfd44 realm::util::FunctionRef<void (realm::BPlusTreeNode*, unsigned long)>::FunctionRef<realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const::'lambda'(realm::BPlusTreeNode*, unsigned long)&>(realm::ObjKey&&)::'lambda'(void*, realm::BPlusTreeNode*, unsigned long)::__invoke(void*, realm::BPlusTreeNode*, unsigned long) + 28
3 Realm 0xfd18 realm::BPlusTree<realm::ObjKey>::get_uncached(unsigned long) const + 64
4 Realm 0x60e24 realm::LnkLst::get_object(unsigned long) const + 32
5 Realm 0x61128 realm::ObjList::try_get_object(unsigned long) const + 100
6 Realm 0x23fe38 realm::Query::do_find_all(realm::TableView&, unsigned long) const + 156
7 Realm 0x2cee24 realm::TableView::do_sync() + 652
8 Realm 0x3f0c98 realm::Results::ensure_up_to_date(realm::Results::EvaluateMode) + 456
9 Realm 0x3f0590 realm::util::Optional<realm::Obj> realm::Results::try_get<realm::Obj>(unsigned long) + 48
10 Realm 0x3f0498 realm::Obj realm::Results::get<realm::Obj>(unsigned long) + 80
11 Realm 0x2ced8 RLMAccessorContext realm::Results::dispatch<auto realm::Results::get<RLMAccessorContext>(RLMAccessorContext&, unsigned long)::'lambda'(RLMAccessorContext&)>(RLMAccessorContext&) const + 380
12 Realm 0x28cb8 auto realm::Results::get<RLMAccessorContext>(RLMAccessorContext&, unsigned long) + 36
13 Realm 0x12b3b8 -[RLMResults objectAtIndex:] + 52
➤ Jørgen Edelbo commented:
We cannot find an explanation to why you see these crashes so there is clearly something we don't know about your use case. Without a minimal reproduction case, I don't think there is a way we can proceed with this.
Hello, this is a quite old thread. Do we have some way to reproduce this? Do we know if there was a migration that could have changed things? We some vague idea of how this could have happened, I could try to reproduce it in my environment. @Bodnar-Dan ...
➤ Nicola Cabiddu commented:
Closing this issue, because it is more than 1y old, and we have no clear way to reproduce it. It seems a migration could have been responsible for it, but without any further information, there is no way for us to tackle and fix the problem.