grpc-swift
grpc-swift copied to clipboard
Requests failing after going to background
What are you trying to achieve?
This may be a very specific issue, but some of the users of our iOS app complain that after they have app in background for a longer time, requests start to fail after going back to the app. I was trying to replicate this issue, and the only way I was able to replicate it was to launch the app, and then immediately put it into background before the initial requests, which are fetching data, complete. The result I get to these calls is just:
failure(unavailable (14))
Furthermore, while app is in background, we are making some requests to synchronize with API, and they seem to be OK and succeed.
I would appreciate some information if some has also experienced this kind of behavior, why this may be happening, if it is bug or not, and if it can be avoided. The app is running latest release 1.0.0-alpha.14.
Can you provide a little more context @IgorRosocha; are you using Network.framework here? And do you have more info about the errors other than just the unavailable status, the status message and logs, for example?
Furthermore, while app is in background, we are making some requests to synchronize with API, and they seem to be OK and succeed.
Are these gRPC requests?
Can you provide a little more context @IgorRosocha; are you using Network.framework here? And do you have more info about the errors other than just the unavailable status, the status message and logs, for example?
I am using MultiThreadedEventLoopGroup(numberOfThreads: 5). For some reason, when using PlatformSupport.makeEventLoopGroup(loopCount: 1), I always get transientFailure, and therefore all requests end with failure.
Status message of calls is just success(unavailable (14)). Console output in this case looks like this:
2020-06-23 13:13:22.454296+0200 App[2432:561843] [] nw_socket_handle_socket_event [C1.1:3] Socket SO_ERROR [9: Bad file descriptor]
2020-06-23 13:13:22.454320+0200 App[2432:561874] dnssd_clientstub write_all(64) DEFUNCT
2020-06-23 13:13:22.454333+0200 App[2432:561874] dnssd_clientstub deliver_request ERROR: write_all(64, 65 bytes) failed
2020-06-23 13:13:22.454349+0200 App[2432:561874] dnssd_clientstub write_all(64) DEFUNCT
2020-06-23 13:13:22.454413+0200 App[2432:561843] [] nw_read_request_report [C1] Receive failed with error "Bad file descriptor"
2020-06-23 13:13:22.454458+0200 App[2432:561843] [] nw_socket_get_input_frames [C1.1:3] recvmsg(fd 11, 1024 bytes) [57: Socket is not connected]
2020-06-23T13:13:22+0200 error: error=read(descriptor:pointer:size:): Socket is not connected (errno: 57) connection_id=0D42F5D1-3251-4674-829F-86B00BE92A22/0 grpc client error
2020-06-23T13:13:22+0200 error: connection_id=A6E90A86-3358-4508-A6EF-B58E8E61A432/0 error=read(descriptor:pointer:size:): Socket is not connected (errno: 57) grpc client error
2020-06-23T13:13:22+0200 info: old_state=ready connection_id=A6E90A86-3358-4508-A6EF-B58E8E61A432/0 new_state=transientFailure connectivity state change
2020-06-23T13:13:22+0200 info: old_state=ready new_state=transientFailure connection_id=0D42F5D1-3251-4674-829F-86B00BE92A22/0 connectivity state change
2020-06-23T13:13:22+0200 info: new_state=connecting connection_id=0D42F5D1-3251-4674-829F-86B00BE92A22/0 old_state=transientFailure connectivity state change
2020-06-23T13:13:22+0200 info: connection_id=A6E90A86-3358-4508-A6EF-B58E8E61A432/0 old_state=transientFailure new_state=connecting connectivity state change
2020-06-23T13:13:22+0200 error: error=read(descriptor:pointer:size:): Socket is not connected (errno: 57) connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 grpc client error
2020-06-23T13:13:22+0200 info: old_state=ready new_state=transientFailure connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 connectivity state change
2020-06-23 13:13:22.461023+0200 App[2432:561843] [] nw_read_request_report [C1] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.461107+0200 App[2432:561843] [] nw_socket_handle_socket_event [C2.1:3] Socket SO_ERROR [9: Bad file descriptor]
2020-06-23 13:13:22.461189+0200 App[2432:561843] [] nw_read_request_report [C2] Receive failed with error "Bad file descriptor"
2020-06-23 13:13:22.461220+0200 App[2432:561843] [] nw_socket_get_input_frames [C2.1:3] recvmsg(fd 16, 1024 bytes) [57: Socket is not connected]
2020-06-23T13:13:22+0200 info: old_state=transientFailure new_state=connecting connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 connectivity state change
2020-06-23 13:13:22.462329+0200 App[2432:561843] [] nw_read_request_report [C2] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.462377+0200 App[2432:561843] [] nw_socket_handle_socket_event [C3.1:3] Socket SO_ERROR [9: Bad file descriptor]
2020-06-23 13:13:22.464872+0200 App[2432:561832] 6.26.0 - [Firebase/Firestore][I-FST000001] WatchStream (280edc818) Stream error: 'Unavailable: Socket is not connected'
2020-06-23 13:13:22.468249+0200 App[2432:561843] [] nw_read_request_report [C3] Receive failed with error "Bad file descriptor"
2020-06-23 13:13:22.468337+0200 App[2432:561843] [] nw_socket_get_input_frames [C3.1:3] recvmsg(fd 35, 1024 bytes) [57: Socket is not connected]
2020-06-23 13:13:22.468407+0200 App[2432:561843] [] nw_read_request_report [C3] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.468445+0200 App[2432:561843] [] nw_socket_handle_socket_event [C7.1:3] Socket SO_ERROR [9: Bad file descriptor]
2020-06-23 13:13:22.470545+0200 App[2432:561843] [] nw_read_request_report [C7] Receive failed with error "Bad file descriptor"
2020-06-23 13:13:22.470588+0200 App[2432:561843] [] nw_socket_get_input_frames [C7.1:3] recvmsg(fd 70, 1024 bytes) [57: Socket is not connected]
2020-06-23 13:13:22.470643+0200 App[2432:561843] [] nw_read_request_report [C7] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.470676+0200 App[2432:561843] [] nw_socket_handle_socket_event [C6.1:3] Socket SO_ERROR [9: Bad file descriptor]
2020-06-23 13:13:22.471045+0200 App[2432:561843] [] nw_read_request_report [C6] Receive failed with error "Bad file descriptor"
2020-06-23 13:13:22.471424+0200 App[2432:561843] [] nw_socket_get_input_frames [C6.1:3] recvmsg(fd 71, 1024 bytes) [57: Socket is not connected]
2020-06-23 13:13:22.471927+0200 App[2432:561843] [] nw_read_request_report [C6] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.472232+0200 App[2432:561843] [] nw_write_request_report [C2] Send failed with error "Broken pipe"
2020-06-23 13:13:22.472311+0200 App[2432:561843] Connection 2: write error 1:32
2020-06-23 13:13:22.472354+0200 App[2432:561843] [] nw_read_request_report [C2] Receive failed with error "Broken pipe"
2020-06-23 13:13:22.472724+0200 App[2432:561843] [] nw_read_request_report [C2] Receive failed with error "Broken pipe"
2020-06-23 13:13:22.475809+0200 App[2432:561843] [] nw_read_request_report [C1] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.475850+0200 App[2432:561843] [] nw_read_request_report [C1] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.480921+0200 App[2432:561843] [] nw_read_request_report [C3] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.481035+0200 App[2432:561843] [] nw_read_request_report [C3] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.484630+0200 App[2432:561843] [] nw_read_request_report [C7] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.484669+0200 App[2432:561843] [] nw_read_request_report [C7] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.487140+0200 App[2432:561843] Task <4E944036-943F-4BCB-BA19-951E98B28263>.<5> HTTP load failed, 42/0 bytes (error code: 9 [1:9])
2020-06-23 13:13:22.487433+0200 App[2432:562505] Task <4E944036-943F-4BCB-BA19-951E98B28263>.<5> finished with error [9] Error Domain=NSPOSIXErrorDomain Code=9 "Bad file descriptor" UserInfo={NSErrorPeerAddressKey=<CFData 0x28115a4e0 [0x1ec4f5de0]>{length = 16, capacity = 16, bytes = 0x100201bb1faa67070000000000000000}, _kCFStreamErrorCodeKey=9, _kCFStreamErrorDomainKey=1}
2020-06-23 13:13:22.488066+0200 App[2432:561843] [] nw_read_request_report [C6] Receive failed with error "Socket is not connected"
2020-06-23 13:13:22.488149+0200 App[2432:561843] [] nw_read_request_report [C6] Receive failed with error "Socket is not connected"
Are these gRPC requests?
Yes, these are all UnaryCalls.
Please let me know if I can provide more detail/information.
We're getting ENOTCONN here, which suggests that the backgrounding of the application is killing the request. I'd expect the retry logic to get us out of this state.
@Lukasa is there a reason for this happening, and a way to avoid or handle this? maybe detect this specific error and apply a retry logic?
It's my belief that this should already be handled by the retry logic. I'm curious if @glbrntt agrees.
It's my belief that this should already be handled by the retry logic. I'm curious if @glbrntt agrees.
If you mean retry logic using Connection Backoff, then unfortunately, it doesn't work in this particular case, and requests are not being retried after failing.
To clarify: gRPC will never retry RPCs on behalf of the user. It will, however, attempt to re-establish the connection which RPCs use if it goes away.
With that in mind:
- if the RPC was started when the connection was ready (corresponding to the 'ready' connectivity state) and the connection drops before it completes then the RPC will fail: this is expected and gRPC will not retry the RPC.
- if the connection was dropped before the RPC was called, then the RPC will wait for a connection to be established before sending any data.
We can see the connections (there appear to be three) failing and attempting to re-establish from the logs. First we see an error on the connection:
2020-06-23T13:13:22+0200 error: error=read(descriptor:pointer:size:): Socket is not connected (errno: 57) connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 grpc client error
Then the connectivity state for that connection changes from ready to transient failure – which is the "waiting to try to reconnect" state:
2020-06-23T13:13:22+0200 info: old_state=ready new_state=transientFailure connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 connectivity state change
The first retry happens almost immediately; the reconnection attempt starts, we move from transient failure to connecting:
2020-06-23T13:13:22+0200 info: old_state=transientFailure new_state=connecting connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 connectivity state change
@glbrntt Thank you for your answer, I believe I understand how the RPCs connection re-establishment works. However I am unable to wrap my head around what could be the reason these UnaryCalls are being killed when user has app in background.
More specifically, I have noticed that these requests seem to fail when user has locked his device. Nevertheless, more and more users complain that the content of our app is not displayed after app is in background for some time. I was trying to investigate why is this happening, or how to prevent it, but currently I have no clue. I am unable to reliably replicate this bug on our testing devices, and when I take a look into our remote logs, all I can see are requests failures when app is in background or user comes to app from background. The result message is always just failure(unavailable (14)).
Do you have any idea, why this might be happening? Have some other users maybe also experienced this? I would love to provide more detail/information, but this is currently all the information I could get.
So .unavailable (14) is a GRPCError code. This is considered transient and should resolve if you retry.
To clarify, are you retrying these failed RPCs? We absolutely expect that the RPCs may fail in cases where the app is in the background: iOS is free to refuse to network those applications, and RPCs that were in in-flight at the time that the app was suspended are highly likely to fail. Retrying these failed RPCs should resolve your issue.
@Lukasa could you link any reference to a RPC retry implementation with grpc-swift? Is this performed through the retry policies or does it have to be done manually?
grpc-swift does not automatically retry RPCs that may have reached the network because it doesn't know if they're safe to retry. That requires application-specific logic.
@Lukasa I have an implemented retry policy, that retries 3 times the call when receiving a status .unavailable (14). I tested this by switching my vpn on and off. Once the call failed 3 times, I re-establish my vpn connection, I manually retry the call (by clicking a send button), I get the same result: all requests fail with same status code. Side-note: I'm using a single ClientConnection for each repository. Does the ClientConnection just stops working with a status 14? Do I need to create a new ClientConnection on every call instead?
@jcgonlop is this related to backgrounding the app? If not, can you file a new issue and provide the relevant details.