realm-object-server
realm-object-server copied to clipboard
Squash old changesets
Feature Request:
Goals
Provide a command to squash commits on the Realm Object Server.
Expected Results
Increased speed for new clients downloading a Realm for the first time, since they can receive objects in fewer change sets. As a client I could potentially receive the latest version of an object right away, instead of starting from the first version and then fast forwarding through every changeset. It would be responsibility of the server admin to ensure that no clients get stuck on an older intermediate version that was lost after the squash operation.
@raudabaugh thanks for filing this! We have discussed internally about this previously and we definitely want to offer it. Right now it hasn't been a high priority, but I will note this as we plan future versions.
Are there any recommendations as to how to best avoid this "fetch entire history" behavior? This is a breaking condition for me because my application has constantly updating objects that are synced across devices. While only a handful are ever current in the database, they are constantly updating. Our system is meant to run continuously for days, so it accumulates a large history. When a new device joins the realm, it can take more than twenty minutes to sync all of the data even though the amount of "current data" is minuscule.
Currently, there is not a good solution other than to copy data into a new Realm, which would then only have the history of the copy operations. This is clearly not an acceptable solution and we recognize that. This past month we merged in a change to Realm Object Server where we will be storing the Realm states in addition to the histories (currently the server only handles the histories, with the client SDKs materializing the state from downloading the history).
There were several reasons we added managing the state in ROS, but one benefit will be that we can use this state to send it on initial load instead of the histories. This seems to be the main problem with long histories and would fix your issue. However, we also plan to go further and start compacting histories as well. Other changes have been made in the past month that enable us to quickly identify meaningless operations that can be "garbage collected." For example, old SET operations are not necessary to keep around.
I don't have a specific time table for these, but they will become priority as we wrap up current pushes this month.
Thank you for the update. I'll be eagerly waiting for these features.
Hi @bigfish24 - any update given your last status where you expected this to become prioritized after that month? This is an alarming issue to consider when evaluating ROS adoption. Thanks.
It is being prioritized now and we expect initial efforts to land in September. We are both pursuing online log compaction such as collapsing set operations and also the ability to download the Realm on first open vs syncing the operations (which would be possible only through the asyncOpen API).
Hi @bigfish24! This feature would be very important for us. I just tried to asyncOpen a realm which has a long history of transactions, but the actual realm file is basically empty, and it still takes a lot of time and network traffic to sync.
- Can we still expect this to be released in September?
- Will this feature be available for the developer edition of Realm?
- Will there be a way to tell which method will need less data to download (first open or syncing the operations)?
Thank you!
@probi86
- Yes
- Yes
- First open is always going to be more performant not just because of the ability to send the data as the download, but also because it prevents the server from merging out of order changesets (such as if you open the Realm offline, write to it, then connect. The local writes need to be merged back into an eventually consistent order. Obviously if you are targeting offline-first, then this has to happen, but if not, then asyncOpen will be generally more performant).
Hi @bigfish24,
Is there an update to the status of this issue? I'm assuming that September is no longer the expected release timeframe.
I did see a 2.0.0rc1 release of realm-dotnet that referenced a 2.x release of realm-mobile-platform; however, none of the listed features seemed to imply the functionality described in this issue.
The .NET changelog only lists features added to the SDK itself, whereas this is a server-side feature that will be covered by the ROS changelog.
We just installed ROS 2.0, updated to RealmSwift 3.0.0 on the client, and it seems there's no improvement for this issue. Opening an almost empty realm, with a long history of insertions and deletions takes a lot of time and bandwidth. Is there a setting on ROS 2.0 to turn this on? Thanks!
There is no setting right now, but we can introduce a setting rather fast. The memory consumption of the server will go up temporarily, but the network activity will be low if the objects have been deleted.
Yes, adding that setting to the server would be great.
- Do you know when can we expect this to be added to ROS2?
- In what cases will the server compact the history? For every AsyncOpen call? Will there be client specific APIs to control this?
- What happens if the client is partially already up to date? Will the server be able to tell which method would result in lower network activity (sending the changes/diffs starting from where the client stands or sending the actual state of the database)?
Thanks!
-
The PRs have been accepted. The changes will almost certainly be in the next release, which should come rather soon.
-
The server will always compact the history at every download. It will take a piece of the history whose size is set by the new parameter. This piece is then compacted before it is sent to the clients. It is also gzip compressed by the way.
-
There is only one method right now. In the future, there could be an initial method that sends the state instead of the history. Including that future possible change, the client will download the history after the initial state download.
Any updates on the timeline for releasing this feature? Thanks!
@probi86 It is released. It should work now. Please tell us if you still have issues with this.
Oh, great. Are the changes documented anywhere? What do we need to change in order to benefit of compaction? Is there a server setting or does the client need to do anything? Thanks!
It looks like it isn't documented.
You add a parameter called maxDownloadSize to your server script like this:
import { BasicServer } from 'realm-object-server' import * as path from 'path'
const server = new BasicServer()
server.start({
dataPath: path.join(__dirname, '../data'),
logLevel: 'trace',
maxDownloadSize: 10000000,
})
.then(() => {
console.log(Your server is started , server.address)
})
.catch(err => {
console.error(There was an error starting your file)
})
When you start the server, you should see the value of maxDownloadSize in the log. Look for that. If it changes, you have done it correctly.
The larger the number the more updates are bundled together and compacted together.
The larger the number the more updates are bundled together and compacted together.
Why larger? I thought it means "compact changesets if download size is more than maxDownloadSize". Could you describe in details how it works and what should we expect.
Why larger?
Generally we can't discover how much space is saved by compaction without actually running the compaction algorithm, which is a CPU-heavy task. Therefore the maxDownloadSize is the size of changes that will be considered as input to compaction. It represents a "worst case" download size, which is when the compaction algorithm couldn't eliminate anything, but on the other hand more changes will be considered the larger it is, so the net result is that the smallest total number of bytes downloaded will decrease as the maxDownloadSize value increases, but the size of each individual download unit will also increase, increasing the risk of clients on unstable connections not ever managing to receive a changeset from the server.
It is essentially a compromise between download size, CPU usage, and letting clients progress through the updates coming from the server. If you have a very fast and stable connection, a very large maxDownloadSize is fine.
A lot of this can be solved with advanced heuristics, but that's also a bit of a timesink, so we need to know more about the use cases to properly prioritize the work. :-)
For me I am seeing no real difference running with the maxDownloadSize. We have a Realm containing 6+ Months of data that is currently taking over an hour to sync on the first install.
I have set maxDownloadSize to varied sizes and measured the sync time, at most I am seeing it reduce the sync time by 6 mins.
How large have you set the value?
Sorry I have another issue open for this Issue 328 It might be best for me to continue commenting here as this seems to be the main record.
When I set the maxDownloadSize to 10MB it was around 56Mins I also tried 2GB which took longer to start and seemed to download quickly then spent the time (I assume) processing. End to end this took 1 hour and 1 min. I have tried varying sizes between 10Mb and 2GB which appear to add or remove a few mins but no big benefit. What I did notice was that if I set the limit to be very high, such as 5GB the server would run out of memory and grind to a halt. Am I right in thinking ROS compacts in memory up to the max value then sends down the change sets to the client for it to add to it's local Realm? This is how I arrived at 2GB as my max value because I have 3GB of RAM on my AWS instance, which meant it didn't use up all the system RAM when the sync starts as it did when I set it to 5GB.
The other thing I don't understand is due to my issues migrating my data (Issue 338) I have resorted to copy the data from my old realm to my new realm which should mean the transaction log is minimal, the realm file once the sync is complete is 469MB yet the network traffic to the device during the sync reached 2GB.
Ultimately I am seeing very little real world benefit from this setting at the moment. If it helps I can provide connection details to my ROS since it is hosted on AWS.
@Jonsapps
Can we get the Realm? I would like to look at the content. It is possible that maxDownloadSize has nothing to do with your problems.
Any updates on this issue? When can we expect a new version of the Realm Object Server to be released, which includes documentation for the compaction? Or is the ROS development stopped? Asking because it was said to be released in September, and almost 6 months have passed since then, and ROS is still 2.0. Thank you!
Hi @probi86, documentation for log compaction exists here, although it is unfortunately still a bit minimalistic. The default value for maxDownloadSize in current releases is 16 MiB.
ROS development is certainly not stopped — on the contrary. ;-)
Note that the slowdown experienced by @Jonsapps mentioned above was actually caused by an extremely verbose loglevel (all or trace).
Maybe I misunderstood how this feature was supposed to work, but what I'm seeing is that if I have 8 simple 11-field objects in my realm whose fields update frequently (every few seconds), my realm grows to over 300MB.
Also, when a new client connects and tries to sync the realm, my azure VM crashes, and the last thing that I see is a memory allocation error, so npm ends up eating up the 3.5GB of RAM I have allocated?
If I only have 8 objects, how do I simply have the disk and ram size reflect the minuscule amount of data that their current states entail?