realm-js icon indicating copy to clipboard operation
realm-js copied to clipboard

SIGSEGV when attempting to resume a Puased syncSession

Open muddy-alex opened this issue 4 months ago • 19 comments

How frequently does the bug occur?

Always

Description

I'm trying to force Realm to perform a sync of all offline changes (it's an asymmetric table).

According to the documentation, we should be able to call realm.syncSession.pause() and realm.syncSession.resume() to perform this operation, however, when calling resume() a SIGSEGV error is thrown.

Is this the correct way to force Realm to perform a sync with AppSync?

Stacktrace & log output

No response

Can you reproduce the bug?

Always

Reproduction Steps

No response

Version

12.6.2

What services are you using?

Both Atlas Device Sync and Atlas App Services

Are you using encryption?

No

Platform OS and version(s)

OSX/Linux

Build environment

NodeJS

Cocoapods version

No response

muddy-alex avatar Mar 15 '24 17:03 muddy-alex

➤ PM Bot commented:

Jira ticket: RJS-2762

sync-by-unito[bot] avatar Mar 15 '24 17:03 sync-by-unito[bot]

@muddy-alex There is no API to force sync as the sync client works mostly autonomously. pause()/resume() can trigger the sync client to take an extra round trip.

If it is possible, please share the stack trace. resume() should not cause a SIGSEGV.

kneth avatar Mar 17 '24 06:03 kneth

No stack trace, unfortunately.

Pause/Resume seems to work before the initial sync has been completed. However, once it has (based on the realm.syncSession.addConnectionNotification event handler being called), calling pause/resume causes the process to abruptly exit.

This is all I get...

Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)

muddy-alex avatar Mar 18 '24 16:03 muddy-alex

Could be related to https://github.com/realm/realm-core/issues/7349? It is possible for you to downgrade? If it works with an earlier version (12.4.0 or older), it could be an indication for us.

kneth avatar Mar 20 '24 10:03 kneth

We were originally testing with 12.3.1 when we found the issue, but upgraded to 12.6.2 to check if it was resolved

muddy-alex avatar Mar 20 '24 13:03 muddy-alex

Let me get this straight - initial sync is happening, but you have created some data offline and don't want to wait for initial sync to complete before that data gets uploaded, which is why you're pausing and resuming the session?

As Kenneth said, even though this is likely not the intended flow, we shouldn't be crashing. But understanding the circumstances around the crash will allow us to repro it.

nirinchev avatar Mar 20 '24 20:03 nirinchev

We only have specific time windows on internet access, which is why we need some kind of 'force-sync' feature as we've found that the NodeJS Client doesn't seem to do any further synchronisation during the lifetime of the process (tested over 2 days).

We've found that the initial sync seems to be restricted to 10MB upload, taking the short time window of internet access into account, we would like this 'force-sync' feature to keep pushing up the 10MB chunks until all offline data has been uploaded.

muddy-alex avatar Mar 21 '24 12:03 muddy-alex

Thanks for that context - I'll chat with our sync team and see what they say.

nirinchev avatar Mar 21 '24 12:03 nirinchev

Hey, so following that conversation, here's a few more questions.

  1. What do you mean by "initial sync" here? Can you clarify what the flow looks like in your app - possibly with some example code snippets. It doesn't have to be the exact thing, but more for illustrative purposes - e.g.:
realm.subscriptions.update(...) // Subscribe for new data
realm.write(() => {
  for (var i = 0; i < 100; i++) {
    realm.create(...); // Create some offline data
  }
});
// User goes into internet accessible window
realm.syncSession.resume();
// wait for something?
realm.syncSession.pause();
  1. Where does the 10mb limit come from? There's nothing in the sync protocol that we're aware of that puts 10 mb limit on uploads and there's nothing we're aware of that would cause pausing/resuming session to force sync and have it behave differently compared to the session just being active.
  2. What is the connection state you're waiting for using addConnectionNotification. This API is intended to let you know that a connection is established, but provides no information whatsoever about the state of synchronization. How are you using it and what information are you inferring from the connection state?
  3. If you can consistently reproduce the crash, can you configure logging to log at trace level and get us the logs from a run that reproduces the crash?

nirinchev avatar Mar 21 '24 13:03 nirinchev

@muddy-alex There is no way currently to force-sync the local data: we first download any data from the server, and then we start uploading the local data. Pausing and resuming the session does not change this behavior in any way. We could add a config option to change that though. If you only have asymmetric tables in the schema, then there is nothing to download so upload would start immediately.

danieltabacaru avatar Mar 21 '24 23:03 danieltabacaru

@kneth I can replicate the SIGSEGV here...

import Realm from 'realm';

const ATLAS_APP_ID = '';
const ATLAS_API_KEY = '';

export const Point = {
  name: 'Point',
  embedded: true,
  properties: {
    type: { type: 'string', default: () => 'Point' },
    coordinates: { type: 'list', objectType: 'double' }
  },
};

class Log extends Realm.Object {
   static schema = {
    name: 'Log',
    asymmetric: true,
    primaryKey: '_id',
    properties: {
      _id: 'objectId',
      timestamp: 'date',
      organisation: 'objectId',
      key: 'string',
      value: 'mixed?[]',
      metadata: 'mixed?{}',
      location: 'Point',
      locationFix: 'int'
    }
  };
}


const app = new Realm.App({ id: ATLAS_APP_ID });

Realm.App.Sync.setLogLevel(app, "trace");

const user = app.currentUser
  ? app.currentUser
  : await app.logIn(Realm.Credentials.apiKey(ATLAS_API_KEY));

const realm = await Realm.open({
  schema: [Log, Point],
  sync: {
    cancelWaitsOnNonFatalError: false,
    existingRealmFileBehavior: {
      type: Realm.OpenRealmBehaviorType.OpenImmediately,
      timeOut: 5000,
      timeOutBehavior: Realm.OpenRealmTimeOutBehavior.OpenLocalRealm
    },
    newRealmFileBehavior: {
      type: Realm.OpenRealmBehaviorType.OpenImmediately,
      timeOut: 5000,
      timeOutBehavior: Realm.OpenRealmTimeOutBehavior.OpenLocalRealm
    },
    flexible: true,
    onError: (session, error) => {
      console.error('Sync Error!');
      console.error(error);
    },
    user
  }
});

realm.syncSession.addConnectionNotification(() => {
  console.log('Sync Connected!');
});

realm.syncSession.addProgressNotification(Realm.ProgressDirection.Download, Realm.ProgressMode.ForCurrentlyOutstandingWork, (transferred, transferable) => {
  console.log(`Sync Downloading: ${transferred}/${transferable}`);
});

realm.syncSession.addProgressNotification(Realm.ProgressDirection.Upload, Realm.ProgressMode.ForCurrentlyOutstandingWork, (transferred, transferable) => {
  console.log(`Sync Uploading: ${transferred}/${transferable}`);
});

setTimeout(async () => {
  console.log('Force Sync');
  await realm.syncSession.pause()
  await realm.syncSession.resume()
}, 15 * 1000);

muddy-alex avatar Mar 22 '24 11:03 muddy-alex

@danieltabacaru it was the documentation here that lead me to believe that this might be possible

muddy-alex avatar Mar 22 '24 11:03 muddy-alex

@nirinchev When I mention to 'initial sync', I am referring to when Realm uploads offline data for the first time the process starts. Based on my reproduction above, when you see

Sync Connected!
Sync Downloading: 0/0
Sync Uploading: 0/201
Sync Uploading: 201/201

We've found that when this initial sync runs, only a chunk (I can't remember why I assumed it was 10MB) is uploaded - rather than all of the offline data. This meant that we had to restart the process a few times before all data was pushed up to AppSync, hence the reason we are looking for a 'force-sync' solution.

muddy-alex avatar Mar 22 '24 11:03 muddy-alex

@muddy-alex I believe the documentation is correct. Forcing sync in that context is forcing connecting to the server, but the behavior is as I described it.

I wrote a similar test and I see no issues with it. I cannot see where class Point is declared in your example. The one you have there looks like a schema. Maybe @kneth can have a look.

danieltabacaru avatar Mar 22 '24 11:03 danieltabacaru

Is there any code following the resume? Maybe something is not cleaned-up properly. You can try to wait for upload completion.

danieltabacaru avatar Mar 22 '24 11:03 danieltabacaru

@danieltabacaru Nothing follows the resume(). As soon as its called we get the SIGSEGV

muddy-alex avatar Mar 22 '24 12:03 muddy-alex

@muddy-alex You are mixing class-based models (Log) and JS schemas (Point), and I wonder if it has a negative impact. I mean, our tests are either-or, so you might be in the land of undefined behavior.

kneth avatar Apr 03 '24 17:04 kneth

I couldn't find a documented way of using Embedded objects with the Class syntax. Updating the above example to just use JS schemas still results in the SIGSEGV.

muddy-alex avatar Apr 04 '24 13:04 muddy-alex

@muddy-alex Thank you to getting back. I'll bring up the issue at the next team meeting.

kneth avatar Apr 05 '24 12:04 kneth