at_client_sdk
at_client_sdk copied to clipboard
Connection to root fails and the exception is not caught
Describe the bug If you have a long running program the connection to root may well fail (it may also fail to secondaries)
To Reproduce Steps to reproduce the behavior:
- Set up a long running program using the SDK (I used the MWC demo code)
- Then run it for a long time (oiver night) or just pull the network cable :-)
- And then see the call to to toot fail and crash your program
Expected behavior The app could well use a try block etc but it would seem to me that this an SDK responsibility and should trap and retry on the behalf of the app ?
Screenshots
INFO|2022-03-03 08:20:02.924288|iotListen|Heart Rate: 70.0
INFO|2022-03-03 08:20:02.924288|iotListen|calling atClient.put for HeartRate #1
SEVERE|2022-03-03 08:20:08.579370|AtLookup|Exception in sending to server, Exception: No response after 3000 millis from remote secondary
SEVERE|2022-03-03 08:20:08.579370|AtLookup|Error in remote verb execution Exception: No response after 3000 millis from remote secondary
AT0023: Timeout waiting for response
INFO|2022-03-03 08:20:12.936009|iotListen|ERROR:key: mwc_hr sharedWith: @colin status: NotificationStatusEnum.undelivered 70.0
INFO|2022-03-03 08:20:12.936009|iotListen|key: mwc_hr sharedWith: @colin status: NotificationStatusEnum.undelivered
SEVERE|2022-03-03 08:20:13.260738|AtLookup|Exception in sending to server, Exception: No response after 3000 millis from remote secondary
SEVERE|2022-03-03 08:20:13.260738|AtLookup|Error in remote verb execution Exception: No response after 3000 millis from remote secondary
AT0023: Timeout waiting for response
#0 RemoteSecondary.executeVerb (package:at_client/src/client/remote_secondary.dart:39:7)
<asynchronous suspension>
#1 AtClientImpl.notifyStatus (package:at_client/src/client/at_client_impl.dart:366:24)
<asynchronous suspension>
#2 NotificationServiceImpl._getFinalNotificationStatus.<anonymous closure> (package:at_client/src/service/notification_service_impl.dart:222:32)
<asynchronous suspension>
#3 NotificationServiceImpl._getFinalNotificationStatus (package:at_client/src/service/notification_service_impl.dart:221:7)
<asynchronous suspension>
#4 NotificationServiceImpl.notify (package:at_client/src/service/notification_service_impl.dart:190:9)
<asynchronous suspension>
#5 shareO2Sat (package:iot_sender/iot_mqtt_listener.dart:202:47)
<asynchronous suspension>
#6 iotListen.<anonymous closure> (package:iot_sender/iot_mqtt_listener.dart:129:9)
<asynchronous suspension>
SEVERE|2022-03-03 08:20:23.918245|AtLookup|AtLookup.findSecondary connection to root.atsign.org exception: SocketException: The semaphore timeout period has expired.
(OS Error: The semaphore timeout period has expired.
, errno = 121), address = root.atsign.org, port = 61533
Unhandled exception:
AT0023: @colin does not exist
#0 AtClientValidation.isAtSignExists (package:at_client/src/util/at_client_validation.dart:53:7)
<asynchronous suspension>
PS C:\Users\colin\GitHub\@company\mwc_demo\dart\iot_sender>
Smartphone (please complete the following information): -MWC Demo
Additional context Add any other context about the problem here.
@gkc perhaps while we open up this code we think about the caching of root information for a set period to save all the noise hitting the root server ?
@gkc and I were seeing this at MWC22. We put a shell script around the Dart sender app to restart it when it crashed due to connectivity issues (which were pretty frequent with the mobile connection).
Colin - yes caching of the secondary info from root lookup is something we should do. Cache invalidation trigger could be if the client fails to (re)connect to the secondary even if internet connection seems to be OK.
@gkc @VJag we have findSecondary method in at_lookup library. there are multiple calls to this method from at_client. Should the logic to cache the secondary url result be implemented in at_lookup or the caller (at_client) ?
Good question.
findSecondary's responsibility is simply to look up the secondary's address from the root server, so I'd say we should leave that responsibility as is, and have the caching implemented in the caller. (Note that the caching is required not just in the client but on the server as well, both outbound and inbound where there's an auth handshake)
We should also file a ticket for the backlog to refactor so that some more of the common low-level code - e.g. client socket creation and creation of outbound connections of various authentication flavours - moves into at_lookup, and therefore the secondaryUrl caching logic can be in fewer places.
at_lookup v3.0.17 published which contains caching logic. Have to implement changes in at_client and server to use the cache
Changes in at_client and server to initialise cache pending
Tried few changes in cache layer and caller. Had a review with Jagan and Gary. https://github.com/atsign-foundation/at_libraries/commits/secondary_url_caching/at_lookup/lib/src Outcome of the discussion
- Have an interface called SecondaryAddressFinder..CacheableSecondaryAddressFinder will be the concrete implementation
- Inject SecondaryAddressFinder into AtClientManager
- On exception handling below scenario needs to be factored in. Cache expires and lookup on root server fails(e.g no network). Have a lastRefreshAt in CacheEntry
published at_lookup 3.0.21 initially which had bug. Fixed and published 3.0.22. end2end tests are failing https://github.com/atsign-foundation/at_client_sdk/actions/runs/2228011709 Yet to figure out the root cause
fixed end2end tests and made code changes in at_client to use secondary address cache. PR is ready for review https://github.com/atsign-foundation/at_client_sdk/pull/496
published at_client v3.0.21 containing caching changes
@cconstab please verify whether the caching solution solves the problem in MWC demo code
I will test
Closing as fixed.