realm-core icon indicating copy to clipboard operation
realm-core copied to clipboard

Less fragmentation: Break transaction history into smaller pieces

Open finnschiermer opened this issue 7 years ago • 22 comments

Since release of 2.0 we've seen several complaints that the realm file becomes much larger than before.

A possible explanation for this is that from 2.0 the transaction history was moved into the realm file, and that the transaction history is represented as blobs up to the max blob size of 16MB.

For smaller databases with transactions with many smaller changes to many objects, this has led to larger blobs being entered into the realm file leading to a wider distribution of block sizes in the file. This change impacts the total size through 2 mechanisms:

  • The wider distribution may cause more fragmentation, requiring a bigger file to hold the data.

  • The larger blobs must fit into a memory mapping section, which is at most 1/8 of the file size. This can also trigger larger files.

We currently break up transaction history entries into multiple blobs when they reach 16MB. We should consider changing this limit to at most 1MB, or perhaps as little as 256Kb.

finnschiermer avatar Dec 06 '16 10:12 finnschiermer

I think this has been discussed before, but while we don't perform any automatic compacting of the user data in a Realm file, would it be an effective compromise to periodically automatically compact the transaction log section?

This issue seems to be cropping up on a very regular basis. If we can find a way to mitigate it soon, that would be very nice. :)

TimOliver avatar Dec 19 '16 22:12 TimOliver

If I understand correctly, automatically compacting the transaction log isn't any easier than doing so for the data part of the file, since it's written in the same pattern, using the same allocator, or nearly so anyway.

In other words, you can't just trim off the end part of the log section because the log "section" could be scattered/fragmented throughout the physical file layout, not just at the end.

No matter what the allocation/expansion scheme we pick, there will always be some cases in which the file will be larger than ideal, so I think that no matter what, we'll always need an option to unstuck these types of situations, such as "compact on launch" tracked in realm/realm-cocoa#3289.

A quick check reveals that we don't have an equivalent issue in core, so I just created one here: #2369.

jpsim avatar Dec 19 '16 23:12 jpsim

What prevents you from calling compact when you launch? What additional support from Core do you need. Anyhow, compact on launch is not likely to help much when the root cause is too large blocks in the transaction logs.

finnschiermer avatar Dec 20 '16 08:12 finnschiermer

What prevents you from calling compact when you launch?

  1. A way to optionally acquire an inteprocess lock on the file (may already exist, I don't know). We don't want to compact if another process is already accessing the Realm. But we don't want to wait forever if there is, we want to skip the compact. We'll do it next time.
  2. A way to get the used vs free space ratio, so we only compact if it's "worth it".

jpsim avatar Dec 20 '16 08:12 jpsim

We added the possibility to get the use and the free space some weeks ago, so 2) is already present. With regard to 1, compact() has had this check since the beginning and will not attempt to compact, if another process or thread is already accessing the realm... Corerction: 2) haven't been released yet, as we've only been doing bugfix releases for a while. It'll be part of the next release.

finnschiermer avatar Dec 20 '16 08:12 finnschiermer

@jpsim @finnschiermer giving my 2 cents: we already use compact() (by modifying headers to make it available in Swift) in a production app. We call it at every launch. It's pretty fast and seems safe. When another process has a ref to the Realm (eg in our case on iOS, our today widget), the compact return false and the compaction is not done.

kevincador avatar Dec 20 '16 09:12 kevincador

A the ratio API was added in #2338. It looks great for our needs. I didn't see it because it was merged to develop but not master.

jpsim avatar Dec 20 '16 09:12 jpsim

A possible explanation for this is that from 2.0 the transaction history was moved into the realm file, and that the transaction history is represented as blobs up to the max blob size of 16MB.

Moving the transaction log into the Realm file however means that in case of any Realm-file corruption, the transaction log can also become broken, and all future transactions will end in error Bad transaction log

Similar issues happening only since Realm Core 2.0+: https://github.com/realm/realm-java/issues/3702

Zhuinden avatar Apr 06 '17 10:04 Zhuinden

Really though, how can the transaction log be used to restore the Realm file if it is corrupted by an error in Realm file management?

Zhuinden avatar Apr 10 '17 10:04 Zhuinden

It can't. Why would you expect that it could? Perhaps I'm missing some context?

finnschiermer avatar Apr 10 '17 14:04 finnschiermer

I'm mostly just asking because AFAIK logs are typically used to rollback in case of error/corruption, but if the Realm file gets corrupted while writing it in such a way that the transaction log is destroyed in the process, that seems rather difficult to do.

Then again, I could be completely unaware of what Realm actually uses the transaction history for.

It's just a bit scary that Realm-Core 2.0 added the transaction log to the Realm file, and a lot of "bad realm file header" and "bad transaction log" errors appear sporadically.

Zhuinden avatar Apr 10 '17 14:04 Zhuinden

Yeah. We don't use the logs to rollback. They are used to drive notifications and for realm mobile platform. If anything, moving them into the realm file should have lowered the risk of corruption. A lot of stuff happened at 2.0. Some of the bugs we've found in the way we manage memory mappings could trigger "bad transaction logs"

finnschiermer avatar Apr 10 '17 15:04 finnschiermer

Once #2719 is merged, we will have a visual on this problem with a graph showing the size of the Realm file and the amount of empty space inside it (example).

ironage avatar Jul 19 '17 20:07 ironage

Do you think this could cause https://forums.realm.io/t/limiting-ros-realm-file-size-growth/1135 ?

Zhuinden avatar Apr 03 '18 10:04 Zhuinden

No, it's not the prime suspect. It may make the situation worse for small files, though.

finnschiermer avatar Apr 03 '18 14:04 finnschiermer

Is this related to https://stackoverflow.com/questions/49856462/uncontrolled-realm-file-growth ?

Zhuinden avatar Apr 16 '18 12:04 Zhuinden

I think so, I posted the issues on stackoverflow and realm.js issue My finding is that the file grows unexpectedly when there are multiple updates or inserts per transaction. More the writes per transaction bigger the file.

amitv87 avatar Apr 20 '18 08:04 amitv87

Any updates on this? It's been 4.5 years and I'm running into this issue. I have 200,000 objects to insert into a Realm. That already takes a handful of seconds when done in a single write transaction, so I'd rather not break up the process into smaller transactions that will decrease performance even further.

I understand that I can manually compact the Realm. But unfortunately, that's only possible on first open, which is precisely the worst imaginable time to compact the file because "on first open" means "The UI would like some data to display RIGHT NOW."

A better stopgap API might be to offer compaction on closing the Realm. At least until the underlying file-size issue is addressed.

bdkjones avatar Sep 05 '22 01:09 bdkjones

@bdkjones Yes, work on online compaction is in progress, see https://github.com/realm/realm-core/pull/5755. This will reduce problems coming from fragmentation.

finnschiermer avatar Sep 05 '22 08:09 finnschiermer

@bdkjones even though the compaction API today is named compactOnLaunch() that isn't exactly true - you can call it anytime you want, not just when first launching the app. Were you aware of that?

ianpward avatar Sep 06 '22 20:09 ianpward

@ianpward I was not! But the Realm can't be open when I call it, correct? For example, I couldn't dispatch a background thread to go compact the Realm while it's open on the main thread to power the UI?

bdkjones avatar Sep 06 '22 20:09 bdkjones

That's correct - you need to close all realm references, similar to your app teardown procedure and then call compact() - not ideal which is why we are looking to improve it but other apps do implement this successfully.

ianpward avatar Sep 06 '22 23:09 ianpward