at_client_sdk icon indicating copy to clipboard operation
at_client_sdk copied to clipboard

Notifications get hung up as secondary syncs

Open cconstab opened this issue 2 years ago • 12 comments

Describe the bug When sending frequent updates (say 2hertz) there is noticeable delay in notifications received in the client.

To Reproduce Steps to reproduce the behavior:

  1. First I send a lot of notifications to a secondary (I used the GPS Demo code)
  2. Then I connect a client and subscribe to nofications
  3. As the client is synching no notifications are being dealt with and so they get hung up until sync completes

Expected behavior Notifications should not be held up by the sync process

Screenshots

https://user-images.githubusercontent.com/6131216/191070158-9d74f280-b9a7-418d-b13d-f1876a17af92.mp4

Were you using an atApplication when the bug was found?

  • GPS Demo is what I used

Additional context Add any other context about the problem here.

cconstab avatar Sep 19 '22 15:09 cconstab

Much improved by removing the old hive boxes on the receiver and changing the onboarding Pref to

  //onboarding preference builder can be used to set onboardingService parameters
  AtOnboardingPreference atOnboardingConfig = AtOnboardingPreference()
    ..hiveStoragePath = '$homeDirectory/.$nameSpace/$receivingAtsign/$deviceName/storage'
    //..syncBatchSize = 1
    ..syncIntervalMins = 10
    ..syncRegex = "ai6bh"
    ..namespace = nameSpace
    ..downloadPath = '$homeDirectory/.$nameSpace/files'
    ..isLocalStoreRequired = true
    ..commitLogPath = '$homeDirectory/.$nameSpace/$receivingAtsign/$deviceName/storage/commitLog'
    ..fetchOfflineNotifications = false
    ..monitorHeartbeatInterval = Duration(seconds: 60)
    ..atKeysFilePath = atsignFile;
    ```

cconstab avatar Sep 19 '22 18:09 cconstab

After leaving over night the symptoms slowly come back, what I also observed is that the monitor can also die in this hang which leads to more delays.

WARNING|2022-09-20 13:55:38.408682|Monitor|Heartbeat response not received within expected duration. Heartbeat was sent at 1663682137536, it is now 1663682138408, last heartbeat response was received at 1663682114992. Will close connection, set status stopped, call retryCallback

INFO|2022-09-20 13:55:38.409651|Monitor|socket.listen onDone called. Will destroy socket, set status stopped, call retryCallback

INFO|2022-09-20 13:55:38.410344|Monitor|Another closeStopAndRetry operation is in progress

INFO|2022-09-20 13:55:38.468482|SyncService|Returning the serverCommitId 14291548**

cconstab avatar Sep 20 '22 13:09 cconstab

I failed to put a SyncProgressListener in place.. Not sure of the correct way to do it on the GPS code.. Let me know when you can and I will add and run a new test over night.. @VJag @kalluriramkumar

cconstab avatar Sep 20 '22 23:09 cconstab

Hi @cconstab. Can you please check "at_functional_test/test/atclient_sync_callback_test.dart" in SDK.

VJag avatar Sep 21 '22 02:09 VJag

Yup that's where I was looking earlier but I could not see understand how to get a progress Listener and then to use it..

Sorry for being a noob !

cconstab avatar Sep 21 '22 02:09 cconstab

I failed to put a SyncProgressListener in place.. Not sure of the correct way to do it on the GPS code.. Let me know when you can and I will add and run a new test over night.. @VJag @kalluriramkumar

Added sync progress listener and pushed the changes to a new branch add-sync-progress-listener.

Following are the changes added: https://github.com/atsign-foundation/mwc_demo/commit/2f1756835206cf28a9937463279ac974b98d2d64

sitaram-kalluri avatar Sep 21 '22 04:09 sitaram-kalluri

@VJag @kalluriramkumar.. This might be significant in that when starting up a flutter app the sync runs (in my case as you onboard the atSign) and during that time the whole app "locks" up as soon as the second message apears things move along..

flutter: INFO|2022-09-21 21:50:20.154337|AtLookup|auth success 
BECOMES "LOCKED"
flutter: INFO|2022-09-21 21:50:20.213338|SyncService|Returning the serverCommitId 16919324 

flutter: INFO|2022-09-21 21:50:27.390809|SyncService|Returning the serverCommitId 16919324 

flutter: INFO|2022-09-21 21:50:34.531585|SyncService|Inside syncComplete. syncRequest.requestSource : SyncRequestSource.app ; syncRequest.onDone : Closure: () => Null

flutter: INFO|2022-09-21 21:50:34.531585|SyncService|Sending result to onDone callback
 
BACK AND RESPONSIVE

This is repeatable, but only with atsigns that have had a lot of commitIds (16919324 in this case) or just have had a lot of notifications from what I am seeing. Not sure if it is one or the other causing the issue ?

Q. WHy do we see the same log line twice ??

cconstab avatar Sep 22 '22 04:09 cconstab

@VJag @kalluriramkumar.. This might be significant in that when starting up a flutter app the sync runs (in my case as you onboard the atSign) and during that time the whole app "locks" up as soon as the second message apears things move along..

flutter: INFO|2022-09-21 21:50:20.154337|AtLookup|auth success 
BECOMES "LOCKED"
flutter: INFO|2022-09-21 21:50:20.213338|SyncService|Returning the serverCommitId 16919324 

flutter: INFO|2022-09-21 21:50:27.390809|SyncService|Returning the serverCommitId 16919324 

flutter: INFO|2022-09-21 21:50:34.531585|SyncService|Inside syncComplete. syncRequest.requestSource : SyncRequestSource.app ; syncRequest.onDone : Closure: () => Null

flutter: INFO|2022-09-21 21:50:34.531585|SyncService|Sending result to onDone callback
 
BACK AND RESPONSIVE

This is repeatable, but only with atsigns that have had a lot of commitIds (16919324 in this case) or just have had a lot of notifications from what I am seeing. Not sure if it is one or the other causing the issue ?

Q. WHy do we see the same log line twice ??

@cconstab : Found the potential cause for sync to trigger. Fixed the issue and changes are pushed to the following branch : ignore_latestNotificationKey_from_sync

Can you please point to the above at_client SDK and test?

Testing Notes: Tested on @87bluebaton and the when receiving the notifications, the sync is not triggered. Received 208 notifications and server commit ID is at 6.

@87bluebaton@stats:3
data: [{"id":"3","name":"lastCommitID","value":"6"}]
@87bluebaton@stats:11
data: [{"id":"11","name":"NotificationCount","value":"{\"total\":208,\"type\":{\"sent\":0,\"received\":208},\"status\":{\"delivered\":0,\"failed\":0,\"queued\":0},\"operations\":{\"update\":208,\"delete\":0},\"messageType\":{\"key\":2,\"text\":206},\"createdOn\":1663849517286}"}]

sitaram-kalluri avatar Sep 22 '22 14:09 sitaram-kalluri

For sure thanks

cconstab avatar Sep 22 '22 14:09 cconstab

I think that's solves one thing but does not solve the lock up when you hit a high number of commits..

cconstab avatar Sep 22 '22 14:09 cconstab

This fix prevents unnecessary sync's hence there is no commit log and the related logic to compare commit id's. However, murali is working on implementing commit log compaction. The ticket for it is https://github.com/atsign-foundation/at_client_sdk/issues/553.

VJag avatar Sep 22 '22 14:09 VJag

@VJag @kalluriramkumar.. This might be significant in that when starting up a flutter app the sync runs (in my case as you onboard the atSign) and during that time the whole app "locks" up as soon as the second message apears things move along..

flutter: INFO|2022-09-21 21:50:20.154337|AtLookup|auth success 
BECOMES "LOCKED"
flutter: INFO|2022-09-21 21:50:20.213338|SyncService|Returning the serverCommitId 16919324 

flutter: INFO|2022-09-21 21:50:27.390809|SyncService|Returning the serverCommitId 16919324 

flutter: INFO|2022-09-21 21:50:34.531585|SyncService|Inside syncComplete. syncRequest.requestSource : SyncRequestSource.app ; syncRequest.onDone : Closure: () => Null

flutter: INFO|2022-09-21 21:50:34.531585|SyncService|Sending result to onDone callback
 
BACK AND RESPONSIVE

This is repeatable, but only with atsigns that have had a lot of commitIds (16919324 in this case) or just have had a lot of notifications from what I am seeing. Not sure if it is one or the other causing the issue ? Q. WHy do we see the same log line twice ??

@cconstab : Found the potential cause for sync to trigger. Fixed the issue and changes are pushed to the following branch : ignore_latestNotificationKey_from_sync

Can you please point to the above at_client SDK and test?

Testing Notes: Tested on @87bluebaton and the when receiving the notifications, the sync is not triggered. Received 208 notifications and server commit ID is at 6.

@87bluebaton@stats:3
data: [{"id":"3","name":"lastCommitID","value":"6"}]
@87bluebaton@stats:11
data: [{"id":"11","name":"NotificationCount","value":"{\"total\":208,\"type\":{\"sent\":0,\"received\":208},\"status\":{\"delivered\":0,\"failed\":0,\"queued\":0},\"operations\":{\"update\":208,\"delete\":0},\"messageType\":{\"key\":2,\"text\":206},\"createdOn\":1663849517286}"}]

Spent 5 SP to set the at_gps demo in local, find the root cause of the issue and provide a fix.

sitaram-kalluri avatar Oct 03 '22 09:10 sitaram-kalluri

The following https://github.com/atsign-foundation/at_client_sdk/issues/750 issue is the potential cause for the notifications getting hung when the sync runs and PR: https://github.com/atsign-foundation/at_client_sdk/pull/760 has the fix

sitaram-kalluri avatar Oct 21 '22 05:10 sitaram-kalluri

This is working as it should now thanks to @sitaram-kalluri

cconstab avatar Feb 14 '23 20:02 cconstab