realm-core
realm-core copied to clipboard
Less fragmentation: Break transaction history into smaller pieces
Since release of 2.0 we've seen several complaints that the realm file becomes much larger than before.
A possible explanation for this is that from 2.0 the transaction history was moved into the realm file, and that the transaction history is represented as blobs up to the max blob size of 16MB.
For smaller databases with transactions with many smaller changes to many objects, this has led to larger blobs being entered into the realm file leading to a wider distribution of block sizes in the file. This change impacts the total size through 2 mechanisms:
-
The wider distribution may cause more fragmentation, requiring a bigger file to hold the data.
-
The larger blobs must fit into a memory mapping section, which is at most 1/8 of the file size. This can also trigger larger files.
We currently break up transaction history entries into multiple blobs when they reach 16MB. We should consider changing this limit to at most 1MB, or perhaps as little as 256Kb.
I think this has been discussed before, but while we don't perform any automatic compacting of the user data in a Realm file, would it be an effective compromise to periodically automatically compact the transaction log section?
This issue seems to be cropping up on a very regular basis. If we can find a way to mitigate it soon, that would be very nice. :)
If I understand correctly, automatically compacting the transaction log isn't any easier than doing so for the data part of the file, since it's written in the same pattern, using the same allocator, or nearly so anyway.
In other words, you can't just trim off the end part of the log section because the log "section" could be scattered/fragmented throughout the physical file layout, not just at the end.
No matter what the allocation/expansion scheme we pick, there will always be some cases in which the file will be larger than ideal, so I think that no matter what, we'll always need an option to unstuck these types of situations, such as "compact on launch" tracked in realm/realm-cocoa#3289.
A quick check reveals that we don't have an equivalent issue in core, so I just created one here: #2369.
What prevents you from calling compact when you launch? What additional support from Core do you need. Anyhow, compact on launch is not likely to help much when the root cause is too large blocks in the transaction logs.
What prevents you from calling compact when you launch?
- A way to optionally acquire an inteprocess lock on the file (may already exist, I don't know). We don't want to compact if another process is already accessing the Realm. But we don't want to wait forever if there is, we want to skip the compact. We'll do it next time.
- A way to get the used vs free space ratio, so we only compact if it's "worth it".
We added the possibility to get the use and the free space some weeks ago, so 2) is already present. With regard to 1, compact() has had this check since the beginning and will not attempt to compact, if another process or thread is already accessing the realm... Corerction: 2) haven't been released yet, as we've only been doing bugfix releases for a while. It'll be part of the next release.
@jpsim @finnschiermer giving my 2 cents: we already use compact()
(by modifying headers to make it available in Swift) in a production app. We call it at every launch. It's pretty fast and seems safe. When another process has a ref to the Realm (eg in our case on iOS, our today widget), the compact return false and the compaction is not done.
A the ratio API was added in #2338. It looks great for our needs. I didn't see it because it was merged to develop
but not master
.
A possible explanation for this is that from 2.0 the transaction history was moved into the realm file, and that the transaction history is represented as blobs up to the max blob size of 16MB.
Moving the transaction log into the Realm file however means that in case of any Realm-file corruption, the transaction log can also become broken, and all future transactions will end in error Bad transaction log
Similar issues happening only since Realm Core 2.0+: https://github.com/realm/realm-java/issues/3702
Really though, how can the transaction log be used to restore the Realm file if it is corrupted by an error in Realm file management?
It can't. Why would you expect that it could? Perhaps I'm missing some context?
I'm mostly just asking because AFAIK logs are typically used to rollback in case of error/corruption, but if the Realm file gets corrupted while writing it in such a way that the transaction log is destroyed in the process, that seems rather difficult to do.
Then again, I could be completely unaware of what Realm actually uses the transaction history for.
It's just a bit scary that Realm-Core 2.0 added the transaction log to the Realm file, and a lot of "bad realm file header" and "bad transaction log" errors appear sporadically.
Yeah. We don't use the logs to rollback. They are used to drive notifications and for realm mobile platform. If anything, moving them into the realm file should have lowered the risk of corruption. A lot of stuff happened at 2.0. Some of the bugs we've found in the way we manage memory mappings could trigger "bad transaction logs"
Once #2719 is merged, we will have a visual on this problem with a graph showing the size of the Realm file and the amount of empty space inside it (example).
Do you think this could cause https://forums.realm.io/t/limiting-ros-realm-file-size-growth/1135 ?
No, it's not the prime suspect. It may make the situation worse for small files, though.
Is this related to https://stackoverflow.com/questions/49856462/uncontrolled-realm-file-growth ?
I think so, I posted the issues on stackoverflow and realm.js issue My finding is that the file grows unexpectedly when there are multiple updates or inserts per transaction. More the writes per transaction bigger the file.
Any updates on this? It's been 4.5 years and I'm running into this issue. I have 200,000 objects to insert into a Realm. That already takes a handful of seconds when done in a single write transaction, so I'd rather not break up the process into smaller transactions that will decrease performance even further.
I understand that I can manually compact the Realm. But unfortunately, that's only possible on first open, which is precisely the worst imaginable time to compact the file because "on first open" means "The UI would like some data to display RIGHT NOW."
A better stopgap API might be to offer compaction on closing the Realm. At least until the underlying file-size issue is addressed.
@bdkjones Yes, work on online compaction is in progress, see https://github.com/realm/realm-core/pull/5755. This will reduce problems coming from fragmentation.
@bdkjones even though the compaction API today is named compactOnLaunch()
that isn't exactly true - you can call it anytime you want, not just when first launching the app. Were you aware of that?
@ianpward I was not! But the Realm can't be open when I call it, correct? For example, I couldn't dispatch a background thread to go compact the Realm while it's open on the main thread to power the UI?
That's correct - you need to close all realm references, similar to your app teardown procedure and then call compact() - not ideal which is why we are looking to improve it but other apps do implement this successfully.