grpc-swift icon indicating copy to clipboard operation
grpc-swift copied to clipboard

Requests failing after going to background

Open IgorRosocha opened this issue 5 years ago • 13 comments

What are you trying to achieve?

This may be a very specific issue, but some of the users of our iOS app complain that after they have app in background for a longer time, requests start to fail after going back to the app. I was trying to replicate this issue, and the only way I was able to replicate it was to launch the app, and then immediately put it into background before the initial requests, which are fetching data, complete. The result I get to these calls is just:

failure(unavailable (14))

Furthermore, while app is in background, we are making some requests to synchronize with API, and they seem to be OK and succeed.

I would appreciate some information if some has also experienced this kind of behavior, why this may be happening, if it is bug or not, and if it can be avoided. The app is running latest release 1.0.0-alpha.14.

IgorRosocha avatar Jun 23 '20 14:06 IgorRosocha

Can you provide a little more context @IgorRosocha; are you using Network.framework here? And do you have more info about the errors other than just the unavailable status, the status message and logs, for example?

Furthermore, while app is in background, we are making some requests to synchronize with API, and they seem to be OK and succeed.

Are these gRPC requests?

glbrntt avatar Jun 24 '20 08:06 glbrntt

Can you provide a little more context @IgorRosocha; are you using Network.framework here? And do you have more info about the errors other than just the unavailable status, the status message and logs, for example?

I am using MultiThreadedEventLoopGroup(numberOfThreads: 5). For some reason, when using PlatformSupport.makeEventLoopGroup(loopCount: 1), I always get transientFailure, and therefore all requests end with failure.

Status message of calls is just success(unavailable (14)). Console output in this case looks like this:

2020-06-23 13:13:22.454296+0200 App[2432:561843] [] nw_socket_handle_socket_event [C1.1:3] Socket SO_ERROR [9: Bad file descriptor]

2020-06-23 13:13:22.454320+0200 App[2432:561874] dnssd_clientstub write_all(64) DEFUNCT

2020-06-23 13:13:22.454333+0200 App[2432:561874] dnssd_clientstub deliver_request ERROR: write_all(64, 65 bytes) failed

2020-06-23 13:13:22.454349+0200 App[2432:561874] dnssd_clientstub write_all(64) DEFUNCT

2020-06-23 13:13:22.454413+0200 App[2432:561843] [] nw_read_request_report [C1] Receive failed with error "Bad file descriptor"

2020-06-23 13:13:22.454458+0200 App[2432:561843] [] nw_socket_get_input_frames [C1.1:3] recvmsg(fd 11, 1024 bytes) [57: Socket is not connected]

2020-06-23T13:13:22+0200 error: error=read(descriptor:pointer:size:): Socket is not connected (errno: 57) connection_id=0D42F5D1-3251-4674-829F-86B00BE92A22/0 grpc client error

2020-06-23T13:13:22+0200 error: connection_id=A6E90A86-3358-4508-A6EF-B58E8E61A432/0 error=read(descriptor:pointer:size:): Socket is not connected (errno: 57) grpc client error

2020-06-23T13:13:22+0200 info: old_state=ready connection_id=A6E90A86-3358-4508-A6EF-B58E8E61A432/0 new_state=transientFailure connectivity state change

2020-06-23T13:13:22+0200 info: old_state=ready new_state=transientFailure connection_id=0D42F5D1-3251-4674-829F-86B00BE92A22/0 connectivity state change

2020-06-23T13:13:22+0200 info: new_state=connecting connection_id=0D42F5D1-3251-4674-829F-86B00BE92A22/0 old_state=transientFailure connectivity state change

2020-06-23T13:13:22+0200 info: connection_id=A6E90A86-3358-4508-A6EF-B58E8E61A432/0 old_state=transientFailure new_state=connecting connectivity state change

2020-06-23T13:13:22+0200 error: error=read(descriptor:pointer:size:): Socket is not connected (errno: 57) connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 grpc client error

2020-06-23T13:13:22+0200 info: old_state=ready new_state=transientFailure connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 connectivity state change

2020-06-23 13:13:22.461023+0200 App[2432:561843] [] nw_read_request_report [C1] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.461107+0200 App[2432:561843] [] nw_socket_handle_socket_event [C2.1:3] Socket SO_ERROR [9: Bad file descriptor]

2020-06-23 13:13:22.461189+0200 App[2432:561843] [] nw_read_request_report [C2] Receive failed with error "Bad file descriptor"

2020-06-23 13:13:22.461220+0200 App[2432:561843] [] nw_socket_get_input_frames [C2.1:3] recvmsg(fd 16, 1024 bytes) [57: Socket is not connected]

2020-06-23T13:13:22+0200 info: old_state=transientFailure new_state=connecting connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 connectivity state change

2020-06-23 13:13:22.462329+0200 App[2432:561843] [] nw_read_request_report [C2] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.462377+0200 App[2432:561843] [] nw_socket_handle_socket_event [C3.1:3] Socket SO_ERROR [9: Bad file descriptor]

2020-06-23 13:13:22.464872+0200 App[2432:561832] 6.26.0 - [Firebase/Firestore][I-FST000001] WatchStream (280edc818) Stream error: 'Unavailable: Socket is not connected'

2020-06-23 13:13:22.468249+0200 App[2432:561843] [] nw_read_request_report [C3] Receive failed with error "Bad file descriptor"

2020-06-23 13:13:22.468337+0200 App[2432:561843] [] nw_socket_get_input_frames [C3.1:3] recvmsg(fd 35, 1024 bytes) [57: Socket is not connected]

2020-06-23 13:13:22.468407+0200 App[2432:561843] [] nw_read_request_report [C3] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.468445+0200 App[2432:561843] [] nw_socket_handle_socket_event [C7.1:3] Socket SO_ERROR [9: Bad file descriptor]

2020-06-23 13:13:22.470545+0200 App[2432:561843] [] nw_read_request_report [C7] Receive failed with error "Bad file descriptor"

2020-06-23 13:13:22.470588+0200 App[2432:561843] [] nw_socket_get_input_frames [C7.1:3] recvmsg(fd 70, 1024 bytes) [57: Socket is not connected]

2020-06-23 13:13:22.470643+0200 App[2432:561843] [] nw_read_request_report [C7] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.470676+0200 App[2432:561843] [] nw_socket_handle_socket_event [C6.1:3] Socket SO_ERROR [9: Bad file descriptor]

2020-06-23 13:13:22.471045+0200 App[2432:561843] [] nw_read_request_report [C6] Receive failed with error "Bad file descriptor"

2020-06-23 13:13:22.471424+0200 App[2432:561843] [] nw_socket_get_input_frames [C6.1:3] recvmsg(fd 71, 1024 bytes) [57: Socket is not connected]

2020-06-23 13:13:22.471927+0200 App[2432:561843] [] nw_read_request_report [C6] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.472232+0200 App[2432:561843] [] nw_write_request_report [C2] Send failed with error "Broken pipe"

2020-06-23 13:13:22.472311+0200 App[2432:561843] Connection 2: write error 1:32

2020-06-23 13:13:22.472354+0200 App[2432:561843] [] nw_read_request_report [C2] Receive failed with error "Broken pipe"

2020-06-23 13:13:22.472724+0200 App[2432:561843] [] nw_read_request_report [C2] Receive failed with error "Broken pipe"

2020-06-23 13:13:22.475809+0200 App[2432:561843] [] nw_read_request_report [C1] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.475850+0200 App[2432:561843] [] nw_read_request_report [C1] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.480921+0200 App[2432:561843] [] nw_read_request_report [C3] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.481035+0200 App[2432:561843] [] nw_read_request_report [C3] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.484630+0200 App[2432:561843] [] nw_read_request_report [C7] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.484669+0200 App[2432:561843] [] nw_read_request_report [C7] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.487140+0200 App[2432:561843] Task <4E944036-943F-4BCB-BA19-951E98B28263>.<5> HTTP load failed, 42/0 bytes (error code: 9 [1:9])

2020-06-23 13:13:22.487433+0200 App[2432:562505] Task <4E944036-943F-4BCB-BA19-951E98B28263>.<5> finished with error [9] Error Domain=NSPOSIXErrorDomain Code=9 "Bad file descriptor" UserInfo={NSErrorPeerAddressKey=<CFData 0x28115a4e0 [0x1ec4f5de0]>{length = 16, capacity = 16, bytes = 0x100201bb1faa67070000000000000000}, _kCFStreamErrorCodeKey=9, _kCFStreamErrorDomainKey=1}

2020-06-23 13:13:22.488066+0200 App[2432:561843] [] nw_read_request_report [C6] Receive failed with error "Socket is not connected"

2020-06-23 13:13:22.488149+0200 App[2432:561843] [] nw_read_request_report [C6] Receive failed with error "Socket is not connected"

Are these gRPC requests?

Yes, these are all UnaryCalls.

Please let me know if I can provide more detail/information.

IgorRosocha avatar Jun 25 '20 08:06 IgorRosocha

We're getting ENOTCONN here, which suggests that the backgrounding of the application is killing the request. I'd expect the retry logic to get us out of this state.

Lukasa avatar Jun 25 '20 10:06 Lukasa

@Lukasa is there a reason for this happening, and a way to avoid or handle this? maybe detect this specific error and apply a retry logic?

IgorRosocha avatar Jun 29 '20 07:06 IgorRosocha

It's my belief that this should already be handled by the retry logic. I'm curious if @glbrntt agrees.

Lukasa avatar Jun 29 '20 07:06 Lukasa

It's my belief that this should already be handled by the retry logic. I'm curious if @glbrntt agrees.

If you mean retry logic using Connection Backoff, then unfortunately, it doesn't work in this particular case, and requests are not being retried after failing.

IgorRosocha avatar Jun 29 '20 07:06 IgorRosocha

To clarify: gRPC will never retry RPCs on behalf of the user. It will, however, attempt to re-establish the connection which RPCs use if it goes away.

With that in mind:

  • if the RPC was started when the connection was ready (corresponding to the 'ready' connectivity state) and the connection drops before it completes then the RPC will fail: this is expected and gRPC will not retry the RPC.
  • if the connection was dropped before the RPC was called, then the RPC will wait for a connection to be established before sending any data.

We can see the connections (there appear to be three) failing and attempting to re-establish from the logs. First we see an error on the connection:

2020-06-23T13:13:22+0200 error: error=read(descriptor:pointer:size:): Socket is not connected (errno: 57) connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 grpc client error

Then the connectivity state for that connection changes from ready to transient failure – which is the "waiting to try to reconnect" state:

2020-06-23T13:13:22+0200 info: old_state=ready new_state=transientFailure connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 connectivity state change

The first retry happens almost immediately; the reconnection attempt starts, we move from transient failure to connecting:

2020-06-23T13:13:22+0200 info: old_state=transientFailure new_state=connecting connection_id=D3F766FA-5EA4-4AAB-8AEA-0DE844B1C173/0 connectivity state change

glbrntt avatar Jun 29 '20 08:06 glbrntt

@glbrntt Thank you for your answer, I believe I understand how the RPCs connection re-establishment works. However I am unable to wrap my head around what could be the reason these UnaryCalls are being killed when user has app in background.

More specifically, I have noticed that these requests seem to fail when user has locked his device. Nevertheless, more and more users complain that the content of our app is not displayed after app is in background for some time. I was trying to investigate why is this happening, or how to prevent it, but currently I have no clue. I am unable to reliably replicate this bug on our testing devices, and when I take a look into our remote logs, all I can see are requests failures when app is in background or user comes to app from background. The result message is always just failure(unavailable (14)).

Do you have any idea, why this might be happening? Have some other users maybe also experienced this? I would love to provide more detail/information, but this is currently all the information I could get.

IgorRosocha avatar Jul 19 '20 17:07 IgorRosocha

So .unavailable (14) is a GRPCError code. This is considered transient and should resolve if you retry.

To clarify, are you retrying these failed RPCs? We absolutely expect that the RPCs may fail in cases where the app is in the background: iOS is free to refuse to network those applications, and RPCs that were in in-flight at the time that the app was suspended are highly likely to fail. Retrying these failed RPCs should resolve your issue.

Lukasa avatar Jul 20 '20 06:07 Lukasa

@Lukasa could you link any reference to a RPC retry implementation with grpc-swift? Is this performed through the retry policies or does it have to be done manually?

jcgonlop avatar Jul 27 '20 16:07 jcgonlop

grpc-swift does not automatically retry RPCs that may have reached the network because it doesn't know if they're safe to retry. That requires application-specific logic.

Lukasa avatar Jul 28 '20 07:07 Lukasa

@Lukasa I have an implemented retry policy, that retries 3 times the call when receiving a status .unavailable (14). I tested this by switching my vpn on and off. Once the call failed 3 times, I re-establish my vpn connection, I manually retry the call (by clicking a send button), I get the same result: all requests fail with same status code. Side-note: I'm using a single ClientConnection for each repository. Does the ClientConnection just stops working with a status 14? Do I need to create a new ClientConnection on every call instead?

jcgonlop avatar Jul 29 '20 16:07 jcgonlop

@jcgonlop is this related to backgrounding the app? If not, can you file a new issue and provide the relevant details.

glbrntt avatar Jul 29 '20 16:07 glbrntt