Subscription fails to start on initial connection to Cognito
Describe the bug I have an AWSAppSyncClient subscription to GraphQL via subscribe(subscription:queue:statusChangeHandler:resultHandler). Sometimes on initial login, subscription is getting stuck on connection phase, and doesn't start correctly (about 50% of the times) When going to background, I stop the subscription via AWSAppSyncSubscriptionWatcher.cancel, and returning to foreground will try to create the subscription from scratch, which does work for failed subscription.
To Reproduce
-
Start App and initialise a subscription to GraphQL via subscribe(subscription:queue:statusChangeHandler:resultHandler) after initial sign in to Cognito.
-
Subscription tries to connect and stops after the following message:
Websocket write - {"id":"***","payload":{"data":"{\"query\":\"subscription OnServerEvent($house_id: String!) {\\n OnServerEvent(house_id: $house_id) {\\n __typename\\n house_id\\n name\\n payload\\n }\\n}\",\"variables\":{\"house_id\":\"***\"}}","extensions":{"authorization":{"Authorization":"***","host":"***.amazonaws.com"}}},"type":"start"}
Expected behavior Subscription should connect without any issue. For example, a correct flow that occurs after returning from background is:
Websocket write - {"id":"***","payload":{"data":"{\"query\":\"subscription OnServerEvent($house_id: String!) {\\n OnServerEvent(house_id: $house_id) {\\n __typename\\n house_id\\n name\\n payload\\n }\\n}\",\"variables\":{\"house_id\":\"***\"}}","extensions":{"authorization":{"Authorization":"***","host":"***.amazonaws.com"}}},"type":"start"}
WebsocketDidReceiveMessage - {"id":"***","type":"start_ack"}
Resetting stale connection timer
WebsocketDidReceiveMessage - {"id":"***","type":"data","payload":{"data":{"OnServerEvent":{"__typename":"ServerEvent","house_id":"***","name":"HOUSE_HEARTBEAT","payload":"{\"isAlive\":true,\"timestamp\":\"1602083675505\"}"}}}}
Resetting stale connection timer
WebsocketDidReceiveMessage - {"type":"ka"}
Resetting stale connection timer
AppSyncRealTimeClient.RealtimeConnectionProvider received keepAlive
Environment(please complete the following information):
- AppSync SDK Version: 3.1.5, 3.1.6
- Dependency Manager: Cocoapods
- Swift Version : 5.0
Device Information (please complete the following information):
- Device: iPhone6+, iPhone 111 Pro Max (problem only occurs on physical devices, not simulator)
- iOS Version: iOS 13.7, iOS 14.0.1
Additional context I'm also passing a SubscriptionStatusChangeHandler that fails to get the subscription change when it fails to start, works ok when subscription does connect.
@chajuss
Sometimes on initial login, subscription is getting stuck on connection phase,
Can you describe your lifecycle in a bit more detail? I'm not sure what "on initial login" means in context of AppSync. Do you mean "the first time my app tries to subscribe after startup", or "the first time my app tries to subscribe after my user logs into my Cognito User Pool", or "...logs into my social provider"?
I'm also passing a SubscriptionStatusChangeHandler that fails to get the subscription change when it fails to start, works ok when subscription does connect.
Do you mean that in the failure case, you receive a connecting event on your SubscriptionStatusChangeHandler, but no connected or disconnected, or that you receive no activity at all?
Some other questions:
- Are you making the same call on the failed initial call as you do in the successful resume foreground? That is, literally the same API call, with the same parameters?
- Your log example in the failure case shows the web socket
writeand nothing else, while your success example shows the expected acknowledgement & keepalives. Can you confirm that the failure case shows no additional log message after the failed case? - I notice your subscription is delivering a
HOUSE_HEARTBEATServerEvent; I assume this is something specific to your app logic, not something you're using to manage the connection? - What auth mode(s) does your client use? "Cognito" is slightly imprecise, since you could be using a JWT tokens via Cognito User Pool, AWS credentials via IAM roles, or even an API key, and still login via Cognito.
We've not seen this behavior in our sample apps, so anything we can do to narrow down the problem will be helpful.
@palpatim our app starts with a user sign in to Cognito user pool, using signIn(username: String, password: String, validationData: [String: String]? = nil, clientMetaData: [String: String] = [:], completionHandler: @escaping ((SignInResult?, Error?) -> Void))
On successful connection, we start the subscription using subscribe<Subscription: GraphQLSubscription>(subscription: Subscription, queue: DispatchQueue = DispatchQueue.main, statusChangeHandler: SubscriptionStatusChangeHandler? = nil, resultHandler: @escaping SubscriptionResultHandler<Subscription>) throws -> AWSAppSyncSubscriptionWatcher<Subscription>? to which we also provide a SubscriptionStatusChangeHandler.
On hang connection, we do get the connecting event, but nothing afterwards.
- Are you making the same call on the failed initial call as you do in the successful resume foreground? That is, literally the same API call, with the same parameters?
When returning from background, we use the exact same API, which connects successfully.
- Your log example in the failure case shows the web socket write and nothing else, while your success example shows the expected acknowledgement & keepalives. Can you confirm that the failure case shows no additional log message after the failed case?
Failed connection show no additional log messages after the initial Websocket write - {"id":"***","payload":{"data":"{\"query\":\"subscription OnServerEvent($house_id: String!) {\\n OnServerEvent(house_id: $house_id) {\\n __typename\\n house_id\\n name\\n payload\\n }\\n}\",\"variables\":{\"house_id\":\"***\"}}","extensions":{"authorization":{"Authorization":"***","host":"***.amazonaws.com"}}},"type":"start"} message.
Waiting for a long period of time doesn't have any effect and doesn't prompt new messages from the websocket.
- I notice your subscription is delivering a HOUSE_HEARTBEAT ServerEvent; I assume this is something specific to your app logic, not something you're using to manage the connection?
Correct, this is an internal implemented subscription message that we're using, but it's not related to managing the subscription.
- What auth mode(s) does your client use? "Cognito" is slightly imprecise, since you could be using a JWT tokens via Cognito User Pool, AWS credentials via IAM roles, or even an API key, and still login via Cognito.
We are using an appClientID and AppClientSecret to sign in to Cognito with a user pool.
I'd like to provide additional info, unfortunately I'm unable as I'm getting no additional info from the SDK at this time. This issue occurs on certain occasions only, without any exact reproducible scenario. Only thing I can say is that our backend is up and running in all cases without any issue.
Thanks for the additional detail. A clarification: when I ask "What auth mode(s) does your client use?" I mean, what auth modes is your GraphQL endpoint configured to use, as described here? I.e., what is the AuthMode setting in your awsconfiguration.json file?
The reason I'm harping on this is that subscriptions are usually tied to a specific owner, and that owner is derived from the currently logged in user. I'm wondering if you're being affected by a race condition we're working to fix on the AWSMobileClient SDK, where the signIn completion callback can sometimes return successfully before the client has retrieved AWS credentials. If you're using AWS_IAM auth mode (even though your users sign in using User Pools), you might be caught by this race condition. If you're using another auth mode such as AMAZON_COGNITO_USER_POOLS, or API_KEY, that probably doesn't pertain.
We are using AMAZON_COGNITO_USER_POOLS, sorry I wasn't clear before :)
Well, that's good because it means you're not being caught by the race condition, but it's unfortunate because I don't have a clear starting point for investigating, but we'll take a look. :)
Do you have any relevant code snippets that exhibit this behavior? I'm especially interested in the interaction between your sign in code, whatever mechanism you're using to detect successful sign in, and your subscription startup code. E.g., you might be invoking subscribe() from your signIn completion handler, or you might be invoking subscribe() in response to a state change that you got from AWSMobileClient.addUserStateListener(), or maybe some other way of recognizing that the user has logged in.
Sorry for late reply.
First of all, we are using a singleton wrapper around AWSMobileClient, which is used throughout our entire app.
Here is our code flow that interacts with sign in and subscription:
In our sign in View Controller, we do the following call with bio-authorization
func bioAuthorise(completion: @escaping (model) -> Void) {
bioAuthManager.requestOnceAuthWith(
localizedReason: reason,
useEnterPasswordInsteadCancel: useEnterPasswordInsteadCancel,
hideEnterPasswordForTouchId: true) { [weak self] success, error in
/* Handle cases and init user parameter */
MobileClient.shared.signIn(user: user) { result in
case .success: self.fetchAppData(user: user, completion: completion)
}
}
}
Next, fetchAppData(user: user, completion: completion):
func fetchAppData(user: User, completion: @escaping (AppModel) -> Void) {
dataProvider.fetchAppData(user: user, house: nil, needSignOutOnFailure: true) { [weak self] result in
switch result {
case .success(let appData):
completion(appData)
}
}
}
Which calls in a dataProvider class:
public func fetchAppData(user: User,
completion: @escaping CompletionResult<Model>) {
let completion: CompletionResult<House> = { result in
switch result {
case .success(let house):
MobileClient.shared.getAccountData(houseId: house.id) { result in
switch result {
case .success(let accountData):
self.realmServiceProvider.write(user: user, houseId: house.id, accountData: accountData)
completion(.success(value: Model(user: user, house: house)))
case .failure(let error): completion(.failure(error: error))
}
}
case .failure(let error): completion(.failure(error: error))
}
}
guard let house = house ?? realmServiceProvider.fetchHouse(by: user.currentHouseId) else {
return self.fetchFirstHouseWithInfo(completion: completion)
}
**self.fetchInitialInfo(for: house, completion: completion)**
}
dataProvider class:
func fetchInitialInfo(for house: House,
needSignOutOnFailure: Bool = true,
completion: @escaping CompletionResult<House>) {
MobileClient.shared.fetchHouses { result in
switch result {
case .failure(let error):
completion(.failure(error: error))
case .success(let houses):
realmServiceProvider.fetchHouseStructure(for: house.id) { result in
guard let result = result, !result.isEmpty else {
self.fetchServerInfo(for: house, completion: completion)
return
}
**self.fetchRecentlyUsed(for: house, completion: completion)**
}
}
}
}
dataProvider class:
func fetchRecentlyUsed(for realmHouse: House, completion: @escaping CompletionResult<House>) {
self.mobileClient.fetchRecentlyUsed(for: realmHouse, maxResults: 3) { result in
switch result {
case .success(let value):
realmHouse.configureRecentlyUsed(from: value)
let recentlyUsedDevices = realmHouse.recentlyUsed.compactMap({ $0 as? Accessory })
MobileClient.shared..getDevicesStates(accessories: recentlyUsedDevices) { result in
if case Result.failure(let error) = result {
completion(.failure(error: error))
} else {
**MobileClient.shared..subscribeForServerEvents(house: realmDetachedHouse)**
completion(.success(value: realmDetachedHouse))
}
}
case .failure(let error):
completion(.failure(error: error))
}
}
}
And marked the call back to starting the subscription in MobileClient.
When returning from background, we do the call to MobileClient.shared..subscribeForServerEvents(house: realmDetachedHouse) from applicationWillEnterForeground directly.
If you need any additional info, please let me know.
Is there any update?
2021-04-29 01:31:13.158786-0700 [8923:215598] WebsocketDidReceiveMessage - {"type":"ka"} 2021-04-29 01:31:13.205265-0700 [8923:215598] Resetting stale connection timer 2021-04-29 01:31:13.208665-0700 [8923:215598] AppSyncRealTimeClient.RealtimeConnectionProvider received keepAlive
I am getting the issue. How can I solve this?
@lawmicha has recently put in a PR to fix a race condition that might impact this behavior. We'll update this issue once that's been released.
PR is merged and released