at_client_sdk icon indicating copy to clipboard operation
at_client_sdk copied to clipboard

Connection to root fails and the exception is not caught

Open cconstab opened this issue 2 years ago • 13 comments

Describe the bug If you have a long running program the connection to root may well fail (it may also fail to secondaries)

To Reproduce Steps to reproduce the behavior:

  1. Set up a long running program using the SDK (I used the MWC demo code)
  2. Then run it for a long time (oiver night) or just pull the network cable :-)
  3. And then see the call to to toot fail and crash your program

Expected behavior The app could well use a try block etc but it would seem to me that this an SDK responsibility and should trap and retry on the behalf of the app ?

Screenshots

INFO|2022-03-03 08:20:02.924288|iotListen|Heart Rate: 70.0 

INFO|2022-03-03 08:20:02.924288|iotListen|calling atClient.put for HeartRate #1

SEVERE|2022-03-03 08:20:08.579370|AtLookup|Exception in sending to server, Exception: No response after 3000 millis from remote secondary 

SEVERE|2022-03-03 08:20:08.579370|AtLookup|Error in remote verb execution Exception: No response after 3000 millis from remote secondary    

AT0023: Timeout waiting for response
INFO|2022-03-03 08:20:12.936009|iotListen|ERROR:key: mwc_hr sharedWith: @colin status: NotificationStatusEnum.undelivered 70.0 

INFO|2022-03-03 08:20:12.936009|iotListen|key: mwc_hr sharedWith: @colin status: NotificationStatusEnum.undelivered

SEVERE|2022-03-03 08:20:13.260738|AtLookup|Exception in sending to server, Exception: No response after 3000 millis from remote secondary 

SEVERE|2022-03-03 08:20:13.260738|AtLookup|Error in remote verb execution Exception: No response after 3000 millis from remote secondary    

AT0023: Timeout waiting for response
#0      RemoteSecondary.executeVerb (package:at_client/src/client/remote_secondary.dart:39:7)
<asynchronous suspension>
#1      AtClientImpl.notifyStatus (package:at_client/src/client/at_client_impl.dart:366:24)
<asynchronous suspension>
#2      NotificationServiceImpl._getFinalNotificationStatus.<anonymous closure> (package:at_client/src/service/notification_service_impl.dart:222:32)
<asynchronous suspension>
#3      NotificationServiceImpl._getFinalNotificationStatus (package:at_client/src/service/notification_service_impl.dart:221:7)
<asynchronous suspension>
#4      NotificationServiceImpl.notify (package:at_client/src/service/notification_service_impl.dart:190:9)
<asynchronous suspension>
#5      shareO2Sat (package:iot_sender/iot_mqtt_listener.dart:202:47)
<asynchronous suspension>
#6      iotListen.<anonymous closure> (package:iot_sender/iot_mqtt_listener.dart:129:9)
<asynchronous suspension>

SEVERE|2022-03-03 08:20:23.918245|AtLookup|AtLookup.findSecondary connection to root.atsign.org exception: SocketException: The semaphore timeout period has expired.
 (OS Error: The semaphore timeout period has expired.
, errno = 121), address = root.atsign.org, port = 61533

Unhandled exception:
AT0023: @colin does not exist
#0      AtClientValidation.isAtSignExists (package:at_client/src/util/at_client_validation.dart:53:7)
<asynchronous suspension>
PS C:\Users\colin\GitHub\@company\mwc_demo\dart\iot_sender> 

Smartphone (please complete the following information): -MWC Demo

Additional context Add any other context about the problem here.

cconstab avatar Mar 03 '22 16:03 cconstab

@gkc perhaps while we open up this code we think about the caching of root information for a set period to save all the noise hitting the root server ?

cconstab avatar Mar 03 '22 16:03 cconstab

@gkc and I were seeing this at MWC22. We put a shell script around the Dart sender app to restart it when it crashed due to connectivity issues (which were pretty frequent with the mobile connection).

cpswan avatar Mar 03 '22 17:03 cpswan

Colin - yes caching of the secondary info from root lookup is something we should do. Cache invalidation trigger could be if the client fails to (re)connect to the secondary even if internet connection seems to be OK.

gkc avatar Mar 03 '22 23:03 gkc

@gkc @VJag we have findSecondary method in at_lookup library. there are multiple calls to this method from at_client. Should the logic to cache the secondary url result be implemented in at_lookup or the caller (at_client) ?

murali-shris avatar Mar 29 '22 10:03 murali-shris

Good question.

findSecondary's responsibility is simply to look up the secondary's address from the root server, so I'd say we should leave that responsibility as is, and have the caching implemented in the caller. (Note that the caching is required not just in the client but on the server as well, both outbound and inbound where there's an auth handshake)

We should also file a ticket for the backlog to refactor so that some more of the common low-level code - e.g. client socket creation and creation of outbound connections of various authentication flavours - moves into at_lookup, and therefore the secondaryUrl caching logic can be in fewer places.

gkc avatar Mar 30 '22 08:03 gkc

at_lookup v3.0.17 published which contains caching logic. Have to implement changes in at_client and server to use the cache

murali-shris avatar Apr 16 '22 16:04 murali-shris

Changes in at_client and server to initialise cache pending

murali-shris avatar Apr 19 '22 08:04 murali-shris

Tried few changes in cache layer and caller. Had a review with Jagan and Gary. https://github.com/atsign-foundation/at_libraries/commits/secondary_url_caching/at_lookup/lib/src Outcome of the discussion

  • Have an interface called SecondaryAddressFinder..CacheableSecondaryAddressFinder will be the concrete implementation
  • Inject SecondaryAddressFinder into AtClientManager
  • On exception handling below scenario needs to be factored in. Cache expires and lookup on root server fails(e.g no network). Have a lastRefreshAt in CacheEntry

murali-shris avatar Apr 21 '22 10:04 murali-shris

published at_lookup 3.0.21 initially which had bug. Fixed and published 3.0.22. end2end tests are failing https://github.com/atsign-foundation/at_client_sdk/actions/runs/2228011709 Yet to figure out the root cause

murali-shris avatar Apr 26 '22 17:04 murali-shris

fixed end2end tests and made code changes in at_client to use secondary address cache. PR is ready for review https://github.com/atsign-foundation/at_client_sdk/pull/496

murali-shris avatar Apr 28 '22 07:04 murali-shris

published at_client v3.0.21 containing caching changes

murali-shris avatar May 02 '22 07:05 murali-shris

@cconstab please verify whether the caching solution solves the problem in MWC demo code

murali-shris avatar May 02 '22 15:05 murali-shris

I will test

cconstab avatar May 02 '22 15:05 cconstab

Closing as fixed.

gkc avatar Feb 20 '23 12:02 gkc