grpc-swift icon indicating copy to clipboard operation
grpc-swift copied to clipboard

How to retry RPC after come back from locking the iPad

Open CarlosNano opened this issue 1 year ago • 4 comments

What are you trying to achieve?

We got an app that contains a GRPC server in C++ and a Swift Client.

The Swift client is using grpc-swift 1.23.0 via SPM

I'm trying to get an rpc working after locking the iPad and send the app back to the foreground.

After doing so, I'm getting unavailable 14 error and this log:

Connectivity changed state from old state: idle to new state: connecting
Connectivity changed state from old state: connecting to new state: ready
Connectivity changed state from old state: idle to new state: connecting
Connectivity changed state from old state: idle to new state: connecting
Connectivity changed state from old state: connecting to new state: ready
Connectivity changed state from old state: connecting to new state: ready
NotificationCenter notification name = UIApplicationDidEnterBackgroundNotification,
Connectivity changed state from old state: ready to new state: transientFailure
NotificationCenter notification name = UIApplicationWillEnterForegroundNotification,
didCatchError, error: read(descriptor:pointer:size:): Socket is not connected (errno: 57), logger: Logger(handler: Logging.StreamLogHandler(stream: Logging.StdioOutputStream(file: 0x00000001ec042e58, flushMode: Logging.StdioOutputStream.FlushMode.always), label: "Channel Logger", logLevel: Logging.Logger.Level.info, metadataProvider: nil, prettyMetadata: Optional("grpc.conn.addr_local=127.0.0.1 grpc.conn.addr_remote=127.0.0.1 grpc_connection_id=E1DB886A-C5C4-4681-9508-774489DA43B8/0"), metadata: ["grpc_connection_id": E1DB886A-C5C4-4681-9508-774489DA43B8/0, "grpc.conn.addr_local": 127.0.0.1, "grpc.conn.addr_remote": 127.0.0.1]), label: "Channel Logger")
didCatchError, error: read(descriptor:pointer:size:): Socket is not connected (errno: 57), logger: Logger(handler: Logging.StreamLogHandler(stream: Logging.StdioOutputStream(file: 0x00000001ec042e58, flushMode: Logging.StdioOutputStream.FlushMode.always), label: "Channel Logger", logLevel: Logging.Logger.Level.info, metadataProvider: nil, prettyMetadata: Optional("grpc.conn.addr_local=127.0.0.1 grpc.conn.addr_remote=127.0.0.1 grpc_connection_id=A9B396B1-DE5A-4E41-A9D7-785ED05B21C0/0"), metadata: ["grpc_connection_id": A9B396B1-DE5A-4E41-A9D7-785ED05B21C0/0, "grpc.conn.addr_local": 127.0.0.1, "grpc.conn.addr_remote": 127.0.0.1]), label: "Channel Logger")
didCatchError, error: unavailable (14): Transport became inactive, logger: Logger(handler: Logging.StreamLogHandler(stream: Logging.StdioOutputStream(file: 0x00000001ec042e58, flushMode: Logging.StdioOutputStream.FlushMode.always), label: "Client Logger", logLevel: Logging.Logger.Level.info, metadataProvider: nil, prettyMetadata: Optional("grpc.conn.addr_local=127.0.0.1 grpc.conn.addr_remote=127.0.0.1 grpc_connection_id=5773E2C0-B40F-4572-8004-0CB1E5CAF244/0 grpc_request_id=B62FCBA0-1F07-4254-BBB6-4A63875CB96A"), metadata: ["grpc_connection_id": 5773E2C0-B40F-4572-8004-0CB1E5CAF244/0, "grpc.conn.addr_local": 127.0.0.1, "grpc.conn.addr_remote": 127.0.0.1, "grpc_request_id": B62FCBA0-1F07-4254-BBB6-4A63875CB96A]), label: "Client Logger")
Connectivity changed state from old state: ready to new state: idle
Connectivity changed state from old state: ready to new state: idle
didCatchError, error: unavailable (14): Transport became inactive, logger: Logger(handler: Logging.StreamLogHandler(stream: Logging.StdioOutputStream(file: 0x00000001ec042e58, flushMode: Logging.StdioOutputStream.FlushMode.always), label: "Client Logger", logLevel: Logging.Logger.Level.info, metadataProvider: nil, prettyMetadata: Optional("grpc.conn.addr_local=127.0.0.1 grpc.conn.addr_remote=127.0.0.1 grpc_connection_id=5773E2C0-B40F-4572-8004-0CB1E5CAF244/0 grpc_request_id=1D876631-3CD9-4586-B343-699C166F19BB"), metadata: ["grpc_connection_id": 5773E2C0-B40F-4572-8004-0CB1E5CAF244/0, "grpc_request_id": 1D876631-3CD9-4586-B343-699C166F19BB, "grpc.conn.addr_remote": 127.0.0.1, "grpc.conn.addr_local": 127.0.0.1]), label: "Client Logger")
Connectivity changed state from old state: transientFailure to new state: connecting
Connectivity changed state from old state: connecting to new state: transientFailure
Connectivity changed state from old state: transientFailure to new state: connecting
Connectivity changed state from old state: connecting to new state: transientFailure
Connectivity changed state from old state: transientFailure to new state: connecting
Connectivity changed state from old state: connecting to new state: transientFailure
Connectivity changed state from old state: transientFailure to new state: connecting
Connectivity changed state from old state: connecting to new state: transientFailure

When I start my app I set up the Channel:

var clientConnectionBuilder = MFetcher.buildClientConnection(requiresSecureCalls:self.requiresSecureCalls)
        if self.requiresLogging {
            clientConnectionBuilder = clientConnectionBuilder
                .withBackgroundActivityLogger(self.channelLogger)
                .withConnectivityStateDelegate(self)
        }
        
        let keepAlive = ClientConnectionKeepalive(interval: .milliseconds(2000), timeout: .milliseconds(1999), permitWithoutCalls: false)
        
        clientConnectionBuilder.withKeepalive(keepAlive)
        
        return clientConnectionBuilder
            .withErrorDelegate(self)
            .connect(host: self.api.ipAddress, port: Int(self.api.port))

Then I set up the client with:

    func callOptions(timeOut:Bool, includeTokenIfPossible: Bool) -> CallOptions {
        var callOptions = CallOptions()

        if (timeOut) {
            callOptions.timeLimit = TimeLimit.timeout(TimeAmount.seconds(10))
        }
        
        if let token = self.api.token, includeTokenIfPossible {
            
             callOptions.customMetadata.add(name: "local-auth", value: token)
        }
        
        if self.requiresLogging {
            callOptions.logger = self.clientLogger
        }
        return callOptions
    }

lastly I request a particular RPC:

func streamFCI() -> AnyPublisher<FCI,Error> {
        
        return streamFCIQueue.sync { [weak self] in
            guard let self = self else {
                return Empty<FCI, Error>(completeImmediately: true).eraseToAnyPublisher()
            }
            
            if case .streaming(_ , let anyPublisher, _) = streamFCICallState {
                return anyPublisher
            }
            
            let request = Api_StreamFCInfoRequest()
            let streamData = self.streamSubject(rpc: self.deviceServiceClient.stream_fci)(request)
            
            let publisher = streamData.subject
                .compactMap { $0 }
                .tryMap { response in
                    let decoder = JSONDecoder()
                
                    let fCIResponseData = try response.jsonUTF8Data()
                    let fCIResponse = try decoder.decode(FCI.self, from: fCIResponseData)
                    return fCIResponse
                }
                .share()
                .eraseToAnyPublisher()

            self.streamFCICallState = .streaming(streamData.call, publisher, streamData.subject)

           return publisher

streamSubject contains the calls back from the RPC

call = rpc(request, nil) { response in
                
                subject.send(response)
            }
            
            call.status.whenSuccess { status in
                switch status.code {
                case .ok:
                    subject.send(completion: .finished)
                default:
                    guard status.code != GRPCStatus.Code.cancelled else { return }

                    subject.send(completion: .failure(status))
                }
            }
        
            return (call, subject)

What have you tried so far?

When I get the error unavailable 14 I've tried to cancel the call with:

                    call.cancel(promise: nil)

and recall again the streamFCI, which it will create a new call using the same client and channel. But it doesn't get anything in the closure:

call = rpc(request, nil) { response in
                
                subject.send(response)
            }

Do I have to reestablish the channel too?

For what I read in other tickets ( such as: https://github.com/grpc/grpc-swift/issues/854 ) it's enough to retry the RPC, but I'm not sure if that means calling the RPC using the same channel and the same client than before locking the iPad.

I also tried creating a new client and a new channel after getting the error, but the outcome is the same.

CarlosNano avatar May 10 '24 18:05 CarlosNano

Hi a quick update.

I've changed the logs to trace level and I've noticed that once I get back to the app ( same procedure after send it to the background and lock the iPad) I get this log in a loop:

2024-05-14T10:09:51+0100 debug Channel Logger : grpc_connection_id=DFCB3968-E994-41E7-88EC-EA172895FC8E/6 [GRPC] creating a ClientBootstrap
2024-05-14T10:09:51+0100 debug Channel Logger : delay=9.77868260472508 grpc_connection_id=DFCB3968-E994-41E7-88EC-EA172895FC8E/6 [GRPC] scheduling connection attempt
2024-05-14T10:09:51+0100 debug Channel Logger : grpc_connection_id=DFCB3968-E994-41E7-88EC-EA172895FC8E/6 new_state=transientFailure old_state=connecting [GRPC] connectivity state change

Maybe the problem is that the client keeps trying but the server is down? if that's the case, what should I do in the server to allow incoming retries and new connections?

Also I get this odd log message once:

2024-05-14 10:09:59.816507+0100 MKPad[3157:2127785] dnssd_clientstub write_all(36) DEFUNCT
2024-05-14 10:09:59.816522+0100 MKPad[3157:2127785] dnssd_clientstub deliver_request ERROR: write_all(36, 90 bytes) failed
2024-05-14 10:09:59.816537+0100 MKPad[3157:2127785] dnssd_clientstub write_all(36) DEFUNCT

CarlosNano avatar May 14 '24 09:05 CarlosNano

Do I have to reestablish the channel too?

You shouldn't have to. From your logs you can see the channel attempting to re-establish the connection:

Connectivity changed state from old state: transientFailure to new state: connecting
Connectivity changed state from old state: connecting to new state: transientFailure
Connectivity changed state from old state: transientFailure to new state: connecting
Connectivity changed state from old state: connecting to new state: transientFailure
Connectivity changed state from old state: transientFailure to new state: connecting
Connectivity changed state from old state: connecting to new state: transientFailure
Connectivity changed state from old state: transientFailure to new state: connecting
Connectivity changed state from old state: connecting to new state: transientFailure

For what I read in other tickets ( such as: #854 ) it's enough to retry the RPC, but I'm not sure if that means calling the RPC using the same channel and the same client than before locking the iPad.

Retrying the RPC should be sufficient as the channel should reconnect automatically.

I also tried creating a new client and a new channel after getting the error, but the outcome is the same.

I believe what's happening in your case is that the server is no longer listening so the client cannot open a connection to it when the app returns to the foreground, and so keeps trying in a loop with backoff between attempts.

The most robust way to handle this is to stop both the server and the client when the app enters the background and restart them when it returns to the foreground. There's a technical note (TN2277) which provides a bit more detail.

glbrntt avatar May 14 '24 14:05 glbrntt

Thanks for the answer, I really appreciate.

When you mean:

The most robust way to handle this is to stop both the server and the client when the app enters the background and restart them when it returns to the foreground.

In the client, should I just cancel the calls for any rpc stream currently open? So I create a new ServerStreamingCall when the app is back to the foreground. As you previous commented, I assume I don't have to reconnect the channel neither create a new ServiceClient.

In the server, I guess it would be a matter of just stop the server and start it again?

CarlosNano avatar May 15 '24 10:05 CarlosNano

Thanks for the answer, I really appreciate.

You're welcome!

When you mean:

The most robust way to handle this is to stop both the server and the client when the app enters the background and restart them when it returns to the foreground.

In the client, should I just cancel the calls for any rpc stream currently open? So I create a new ServerStreamingCall when the app is back to the foreground. As you previous commented, I assume I don't have to reconnect the channel neither create a new ServiceClient.

Yes that's right.

In the server, I guess it would be a matter of just stop the server and start it again?

Yup, also right.

glbrntt avatar May 15 '24 14:05 glbrntt