realm-core icon indicating copy to clipboard operation
realm-core copied to clipboard

Realm::get_shared_realm should close the Realm in case of an error

Open nirinchev opened this issue 2 years ago • 20 comments

When an error occurs during Realm::get_shared_realm, the error gets propagated to the developer, but the Realm instance remains open, which then prevents deleting it. There's no way for the developer to then close it because they never get a reference to it. To repro, open a Realm with initialization_function set to something that throws, then try to delete the Realm file.

nirinchev avatar Mar 29 '22 21:03 nirinchev

Technically, you get an instance of realm in the initialization_function passed as argument, thus you could close the realm at that stage. However, failures can occur even before the initialization_function runs, and the realm could remain open generating issues as you reported. Going to open a PR for fixing this issue.

nicola-cab avatar Apr 13 '22 13:04 nicola-cab

The SDK doesn't own the Realm it gets in the initialization callback function, so it should not close it at this point.

nirinchev avatar Apr 13 '22 13:04 nirinchev

Yes indeed, it can also happen before that, so realm must be closed. And probably this is the reason behind why certain object store tests are not running correctly on windows. We could probably solve multiple problems fixing this issue, along with this one https://github.com/realm/realm-core/issues/5364

nicola-cab avatar Apr 13 '22 13:04 nicola-cab

If an exception being thrown is resulting in a Realm which is never returned from get_shared_realm() remaining open, calling close() on it does not make sense as a solution, as that's just partially working around a memory leak. Do you have an example of code which results in the Realm being open when it shouldn't be? A basic test doesn't show it happening:

TEST_CASE("Realm can be deleted after initialization function throws") {
    TestFile config;
    config.schema = Schema{
        {"object", {{"value", PropertyType::Int}}},
    };
    config.initialization_function = [](SharedRealm realm) {
        CHECK(realm);
        throw std::runtime_error("test error");
    };
    REQUIRE_THROWS_WITH(Realm::get_shared_realm(config), Catch::Matchers::Contains("test error"));
    REQUIRE_NOTHROW(Realm::delete_files(config.path));
}

tgoyne avatar Apr 13 '22 16:04 tgoyne

@nirinchev can you provide a sample of code that is leaking the realm? As per @tgoyne request, he does not believe we should implement a fix for that inside core. Also, I don't know if this could apply also to https://github.com/realm/realm-core/issues/5364, it seems a little different though.

nicola-cab avatar Apr 14 '22 10:04 nicola-cab

Okay, after some further digging, this looks like awkwardness of the API when used by garbage collected languages. The specific test that I am looking at that triggers the issue is:

    var config = Configuration([Person.schema], initialDataCallback: (realm) {
      realm.add(Person('p1'));
      throw Exception('very careless developer');
    });

    expect(() => getRealm(config), throws<RealmException>("User-provided callback failed"));
    Realm.deleteRealm(config.path);

The reason why this fails is because we're adding a Person to the Realm before throwing. This will create an instance of a Person which holds a native Object instance, which holds a std::shared_ptr to the Realm, thus keeping the Realm open until the Person is garbage collected. In normal circumstances, this is mitigated by the explicit close method exposed on the Realm class in dart, which allows developers to force close the native Realm, even if there are objects/collections keeping a reference to it. However, if an error occurs while opening it, we never get a reference to the Realm which would allow us to explicitly close it.

So I agree that this is probably not a bug as it is, but would like to hear if you have suggestions on how to handle this. One approach I can think of is to keep track of all objects/collections instantiated within the initialization callback and invalidate them at the end of the callback, destroying their native counterparts, but that's fairly involved and slightly risky to try and squeeze for the beta of the dart SDK.

nirinchev avatar Apr 16 '22 20:04 nirinchev

Why can't you just call close() in your wrapper for initialDataCallback? You have a reference to the Realm there.

tgoyne avatar Apr 18 '22 20:04 tgoyne

We can, it just doesn't seem semantically correct for the SDK to close a Realm instance it didn't create. We'll also need to do this in migration callbacks, which is fine as far as I can tell, but things get a little complicated when you factor in things like should_compact_on_launch. We don't have a reference to the realm there, yet the developer may do something silly like divide by zero or similar, which will throw and prevent the Realm from being opened. If an object/collection instance was created in a previous callback during the Realm open operation, the Realm would remain open with no way to close it.

nirinchev avatar Apr 18 '22 22:04 nirinchev

How is initialDataCallback getting called on a Realm instance that the SDK didn't create? We do have some places where we internally create Realm instances, but if any of those are making calls out to the SDK while creating them something has gone very wrong.

If there's a way for the SDK to have a reference to the Realm at the point where should_compact_on_launch is called then things are broken even in the non-error path as that means that we won't be able to compact.

tgoyne avatar Apr 18 '22 22:04 tgoyne

I think my general objection to this idea is that it's complicating an already complicated part of the code (are we really sure that every single point during Realm opening which throws an exception leaves the Realm in a state where calling close() works?) and doesn't seem guaranteed to work (at the minimum, the old_realm parameter to migration blocks also needs to be closed). More broadly, the SDK holding on references to the Realm passed to initialization callbacks past the end of the callback is definitely not something that any of this code was written to handle, and I'd be surprised if there aren't other problems which will result from doing that.

A perhaps more robust way to handle that would be to always create a new Realm instance for each of the initialization callbacks similar to what migrations do for the old Realm now, which is then explicitly closed regardless of error or success after the callback exits? This'd have a bit of overhead, but I think it'd eliminate a category of possible bugs, maybe make some of the migration code less weird (we currently do some awkward swapping around to make sure that the migration function sees the correct schema), and it's not really a hot path anyway.

tgoyne avatar Apr 18 '22 23:04 tgoyne

How is initialDataCallback getting called on a Realm instance that the SDK didn't create? We do have some places where we internally create Realm instances, but if any of those are making calls out to the SDK while creating them something has gone very wrong.

I guess I wasn't particularly precise with my wording here. What I meant was that the lifetime of the Realm instances we're getting in the initialization and migration callbacks is managed by Core/OS. The fact that the SDK is allowed to close seems more like an implementation detail to me than something guaranteed by the API design. Hypothetically, the Core team may decide they want to store something in the Realm file in case a migration fails. They'll wrap the migration callback in a try-catch, store the error, then rethrow it. This will fail if the SDK explicitly closes the file when an error occurs.

The new Realm instance approach you're suggesting makes sense to me, but sounds like a larger change than I anticipated when filing that ticket, so probably something for the Core team to discuss and prioritize internally.

nirinchev avatar Apr 18 '22 23:04 nirinchev

@tgoyne what you are proposing seems to me something bigger in terms of code, and for sure will complicate the code, although it will surely work for initialDataCallback. However, this callback seems the only place where we need to be careful, since we can throw an exception in many other places in Realm::get_shared_realm. It seems we will have to create a tmp realm and swap it with the current one just for solving the problem with the callback. However, we cannot guarantee that initialDataCallback will never throw, so either core or the sdk needs to ensure that the realm is closed (to me seems a core issue to be honest).

I'll discuss with the core team and see. @nirinchev Do we want to keep this issue open or close it in order to open a broader one? I think this: https://github.com/realm/realm-core/issues/5364 falls in the same category.

nicola-cab avatar Apr 19 '22 10:04 nicola-cab

I think https://github.com/realm/realm-core/issues/5364 is related, but can probably be handled separately. For that one, the thing we need to ensure is that if we create a file and opening it fails, that file should be deleted and for that we don't really need to swap out Realms or stuff like that.

nirinchev avatar Apr 19 '22 11:04 nirinchev

Well, both fixes need to be placed in the same code path, we need to catch the exception thrown, close realm/delete the file and re-throw. https://github.com/realm/realm-core/issues/5364 is essentially a super set of this one. I can extend my changes in order to delete the files, but essentially I will be fixing both cases.

nicola-cab avatar Apr 19 '22 13:04 nicola-cab

➤ Nicola Cabiddu commented:

Didn't we agree that this was simply impossible to achieve, considering the current design of our code?

If I recall correctly, closing Realm is basically impossible without releasing all the other references that are held.

Which may belong to different applications, thus force closing realm could result in spurious crashes in another apps which are using the same realm file.

Essentially, we can't close realm in respect of other processes that may hold a reference to it.

sync-by-unito[bot] avatar Sep 14 '22 15:09 sync-by-unito[bot]

We agreed that we can't delete the Realm, but we can definitely close it.

nirinchev avatar Sep 19 '22 09:09 nirinchev

If I recall correctly, the main problem in calling explicitly close() at ObjectStore level is that you cannot do that safely in regard to multiple processes that might be using the same realm file. I will have a look again.

nicola-cab avatar Sep 19 '22 10:09 nicola-cab

We can close the specific Realm instance which we were opening. The thing we can't guarantee is that that'll actually result in the file being closed.

tgoyne avatar Sep 19 '22 16:09 tgoyne

So this is hopefully going to become less of an issue for dart as we're introducing reference tracking for child objects (lists, results, objects) obtained from a Realm during a Core-originating callback (i.e. migrations, and initialization). While we're not going to explicitly close the Realm at the end of such callback, we'll make sure to release all objects owned by that Realm, so there should be no SDK-owned shared pointers at the end of the callback.

While I still think there might be some value in moving that logic into Core somehow, I believe you when you say it's a fairly complicated system, so perhaps not worth burdening it further.

nirinchev avatar Sep 19 '22 19:09 nirinchev

@nirinchev I am not sure if this is a T-bug (or something to keep in the core backlog), a Realm instance at object store level is just keeping track of who is using the file, and in case all the references are gone, we then release automatically the resources held. I think this is the current design and implementation we have.

nicola-cab avatar Sep 20 '22 08:09 nicola-cab

➤ Brian Munkholm commented:

Nikola is ok closing this.

sync-by-unito[bot] avatar Sep 28 '22 12:09 sync-by-unito[bot]