grpc-node icon indicating copy to clipboard operation
grpc-node copied to clipboard

Keepalive server side not working

Open unludo opened this issue 4 years ago • 11 comments

Problem description

Grpc server does not take keepalive options into account. Not possible to detect client disconnection during server streaming.

Reproduction steps

Create a server with keepalive options (several combinations tested)

        const grpcOptions = {
            'grpc.keepalive_time_ms': 5000,
            'grpc.keepalive_timeout_ms': 5000,
            'grpc.grpc.max_connection_idle_ms': 5000,
            'grpc.keepalive_permit_without_calls': 1,
            'grpc.http2.max_pings_without_data': 2000000,
            'grpc.http2.max_ping_strikes': 1,
            // 'grpc.http2.min_sent_ping_interval_without_data_ms': 5000,
            // 'grpc.http2.min_time_between_pings_ms': 10000,
            // 'grpc.http2.min_ping_interval_without_data_ms': 5000
        };

Have a client with no keepalive option call a bidir streaming method.

After a while, no information circulate on the grpc connection, but the stream remains open. Client sends data at least once a day. Physically disconnect client from server.

Server should detect disconnection thanks to keepalive messages. But it seems that albeit keepalive options, the keepalive messages are not sent as seen in logs. So nothing is detected.

Environment

  • OS name, version and architecture: Linux Ubuntu 16.04 amd64
  • Node version 14.8.0
  • Node installation method curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash - && sudo apt-get install -y nodejs
  • If applicable, compiler version NA
  • Package name and version [email protected]

Additional context

I0818 10:22:59.721528027 18301 parsing.cc:430] HTTP:3:HDR:CLI: x-envoy-peer-metadata-id: 73 69 64 65 63 61 72 7e 31 30 2e 36 30 2e 31 2e 31 32 7e 70 72 6f 64 75 63 74 2d 31 2d 31 2d 30 2d 62 2d 32 2d 73 6e 61 70 73 68 6f 74 2d 36 63 63 39 66 64 36 35 39 2d 7a 76 39 71 36 2e 64 65 66 61 75 6c 74 7e 64 65 66 61 75 6c 74 2e 73 76 63 2e 63 6c 75 73 74 65 72 2e 6c 6f 63 61 6c 'sidecar~10.60.1.12~product-1-1-0-b-2-snapshot-6cc9fd659-zv9q6.default~default.svc.cluster.local' I0818 10:22:59.721534878 18301 parsing.cc:430] HTTP:3:HDR:CLI: date: 54 75 65 2c 20 31 38 20 41 75 67 20 32 30 32 30 20 30 38 3a 32 32 3a 35 39 20 47 4d 54 'Tue, 18 Aug 2020 08:22:59 GMT' I0818 10:22:59.721539936 18301 parsing.cc:430] HTTP:3:HDR:CLI: server: 69 73 74 69 6f 2d 65 6e 76 6f 79 'istio-envoy' I0818 10:22:59.721546550 18301 parsing.cc:430] HTTP:3:HDR:CLI: x-envoy-decorator-operation: 70 72 6f 64 75 63 74 2d 31 2d 31 2d 30 2d 62 2d 32 2d 73 6e 61 70 73 68 6f 74 2e 64 65 66 61 75 6c 74 2e 73 76 63 2e 63 6c 75 73 74 65 72 2e 6c 6f 63 61 6c 3a 35 30 30 35 35 2f 2a 'product-1-1-0-b-2-snapshot.default.svc.cluster.local:50055/*' I0818 10:22:59.721555060 18301 parsing.cc:686] parsing trailing_metadata I0818 10:22:59.721559189 18301 parsing.cc:541] HTTP:3:TRL:CLI: grpc-status: 30 '0' I0818 10:22:59.721563049 18301 parsing.cc:541] HTTP:3:TRL:CLI: grpc-message: 4f 4b 'OK' I0818 10:23:04.603784803 18301 chttp2_transport.cc:839] W:0x3cce920 SERVER [ipv4:127.0.0.1:35288] state IDLE -> WRITING [KEEPALIVE_PING] I0818 10:23:04.603850403 18301 writing.cc:89] SERVER: Ping delayed [0x3e12430]: not enough time elapsed since last ping. Last ping 32146.000000: Next ping 332146.000000: Now 37166.000000 I0818 10:23:04.603869242 18301 chttp2_transport.cc:839] W:0x3cce920 SERVER [ipv4:127.0.0.1:35288] state WRITING -> IDLE [begin writing nothing] I0818 10:27:59.583702051 18301 chttp2_transport.cc:839] W:0x3cce920 SERVER [ipv4:127.0.0.1:35288] state IDLE -> WRITING [RETRY_SEND_PING] I0818 10:27:59.583750142 18301 writing.cc:116] SERVER: Ping sent [ipv4:127.0.0.1:35288]: 1999999/2000000 I0818 10:27:59.583765218 18301 chttp2_transport.cc:839] W:0x3cce920 SERVER [ipv4:127.0.0.1:35288] state WRITING -> WRITING [begin write in current thread] I0818 10:27:59.583841187 18301 chttp2_transport.cc:2660] ipv4:127.0.0.1:35288: Start BDP ping err="No Error" I0818 10:27:59.583861770 18301 chttp2_transport.cc:2808] ipv4:127.0.0.1:35288: Start keepalive ping I0818 10:27:59.583881177 18301 chttp2_transport.cc:839] W:0x3cce920 SERVER [ipv4:127.0.0.1:35288] state WRITING -> IDLE [finish writing] I0818 10:27:59.602496376 18301 chttp2_transport.cc:839] W:0x3cce920 SERVER [ipv4:127.0.0.1:35288] state IDLE -> WRITING [PING_RESPONSE] I0818 10:27:59.602544504 18301 chttp2_transport.cc:2676] ipv4:127.0.0.1:35288: Complete BDP ping err="No Error" I0818 10:27:59.602569380 18301 chttp2_transport.cc:2821] ipv4:127.0.0.1:35288: Finish keepalive ping I0818 10:27:59.602589152 18301 chttp2_transport.cc:839] W:0x3cce920 SERVER [ipv4:127.0.0.1:35288] state WRITING -> WRITING [begin write in current thread] I0818 10:27:59.602662284 18301 chttp2_transport.cc:839] W:0x3cce920 SERVER [ipv4:127.0.0.1:35288] state WRITING -> IDLE [finish writing] I0818 10:28:04.602829015 18301 chttp2_transport.cc:839] W:0x3cce920 SERVER [ipv4:127.0.0.1:35288] state IDLE -> WRITING [KEEPALIVE_PING] I0818 10:28:04.602896452 18301 writing.cc:89] SERVER: Ping delayed [0x3e12430]: not enough time elapsed since last ping. Last ping 332146.000000: Next ping 632146.000000: Now 337165.000000 I0818 10:28:04.602917480 18301 chttp2_transport.cc:839] W:0x3cce920 SERVER [ipv4:127.0.0.1:35288] state WRITING -> IDLE [begin writing nothing]

unludo avatar Aug 18 '20 08:08 unludo

It is due to a proxy, having an esp in the middle.

Is there a way to tell the server to ignore the proxy?

unludo avatar Aug 18 '20 09:08 unludo

The server doesn't know about the proxy. It's just working with whatever connection is established. I don't think the proxy should keep the stream open after its client disconnects.

murgatroid99 avatar Aug 18 '20 16:08 murgatroid99

I can confirm, that I also experience readable streams from client side, stop receiving any messages after a few days, sometimes weeks. I enabled debug output via GRPC_TRACE=connectivity_state,health_check_client,executor,http,http_keepalive,http2_stream_state,call_error,op_failure,timer,subchannel,server_channel,client_channel_call,handshaker and GRPC_VERBOSITY=DEBUG, but no channel close or connection failures/stream state changes are logged. tcpdump also logs 0 packets on the interface. Via node debugger I checked that the ClientReadableStream has a valid state and is waiting for data.

I did not set any grpc options, but will try the keepalive settings.

Patrick-Remy avatar Sep 03 '20 13:09 Patrick-Remy

@Patrick-Remy have you had any luck with the keepalive?

Could you share what you did? I think I'm having a similar issue.

JCMais avatar Mar 03 '21 21:03 JCMais

I am not sure, if that helped, and if my bug was related to grpc or to the grpc server side application.

Some months since that, the stream worked pretty well. But then we encountered it some weeks ago again, but this time we got UNAVAILABLE: keepalive watchdog timeout exceptions and the server was not reachable and the application throwed exceptions. I am not sure if this is was related to the streamSettings I did set. Unfortunately we have no influence to the server app, and client and server keepalive settings need to match. The default server keepalive settings are pretty high (2 hours): [https://github.com/grpc/grpc/blob/master/doc/keepalive.md#defaults-values](keepalive defaults). And if you set lower values in your client, you get a GOAWAY from the server.

But enabling keepalive in the client is always not bad:

const streamOptions = {
    // send pings every X seconds if there is no activity,
    // it is limited by min_sent_ping_interval_without_data_ms/min_time_between_pings_ms, but prints out 
    // some logs if `DEBUG=1 GRPC_VERBOSITY=DEBUG GRPC_TRACE=connectivity_state,http_keepalive`
    // enabled  :-)
    'grpc.keepalive_time_ms': 15000

    /**
     * Following values cannot be set as they cause a GOAWAY from server :-(
     */
    // wait timeout for ping ack before considering the connection dead
    // 'grpc.keepalive_timeout_ms': 15000
    // send pings even without active streams
    // 'grpc.keepalive_permit_without_calls': 1
    // always send pings
    // 'grpc.http2.max_pings_without_data': 0,
    // same ping interval without data, as with data
    // 'grpc.http2.min_sent_ping_interval_without_data_ms': 5000,
    // same as above for compatibility reasons, see https://grpc.github.io/grpc/core/group__grpc__arg__keys.html#ga69583c8efdbdcb7cdf9055ee80a07014
    // 'grpc.http2.min_time_between_pings_ms': 30000
    // recreate connection after 2m? consider if useful for debugging
    // 'grpc.grpc.max_connection_age_ms': 2 * 60 * 1000
}
const client = new grpc.Client(address, credentials, streamOptions)

As stated in my comments, grpc.keepalive_time_ms enables the keepalive, but effectively is set to min_sent_ping_interval_without_data_ms/min_time_between_pings_ms (5min by default). But it leads to debug logging messages like not sending keepalive, as min_sent_ping_interval_without_data_ms not reached which was helpful to see that our client application is still doing something.

In our case we are receiving data every few seconds by the stream, so the keepalive which is only send if there was no recent activity should be send very fast and every x seconds to discover connection problems very fast.

@murgatroid99 Do you know the reasons why the server default values are so high? Or am I misunderstanding these values? Is recently read about an idleTimeout to auto-close the stream. Possible this can help us, how to set it via JS?

Patrick-Remy avatar Mar 05 '21 11:03 Patrick-Remy

Since a few days the error occurs every day after some hours. I got debug logs re-enabled now :

DEBUG=1 GRPC_VERBOSITY=DEBUG GRPC_TRACE=connectivity_state,health_check_client,executor,http,http_keepalive,http2_stream_state,call_error,op_failure,timer,subchannel,server_channel,client_channel_call,handshaker

I0901 21:30:08.961942040      17 chttp2_transport.cc:837]    W:0x55c09ea441e0 CLIENT [ipv4:217.xxx.xxx.xxx:443] state WRITING+MORE -> WRITING [continue writing]
I0901 21:30:08.962000379      17 stream_lists.cc:71]         0x55c09ea441e0[13][cli]: pop from writable
I0901 21:30:08.962028271      17 writing.cc:430]             W:0x55c09ea441e0 CLIENT[13] im-(sent,send)=(1,0) announce=5
I0901 21:30:08.962060882      17 chttp2_transport.cc:1284]   complete_closure_step: t=0x55c09ea441e0 0x55c09e8d6598 refs=0 flags=0x0001 desc=send_trailing_metadata_finished err="No Error" write_state=WRITING
I0901 21:30:08.962093804      17 chttp2_transport.cc:1284]   complete_closure_step: t=0x55c09ea441e0 0x55c09e8d6558 refs=0 flags=0x0001 desc=on_write_finished_cb err="No Error" write_state=WRITING
I0901 21:30:08.962117739      17 stream_lists.cc:125]        0x55c09ea441e0[13][cli]: add to writing
I0901 21:30:08.962143436      17 chttp2_transport.cc:837]    W:0x55c09ea441e0 CLIENT [ipv4:217.xxx.xxx.xxx:443] state WRITING -> WRITING [begin write in background]
I0901 21:30:08.962170326      17 executor.cc:290]            EXECUTOR (default-executor) schedule 0x55c09ea442d0 inline
I0901 21:30:08.962670191      17 chttp2_transport.cc:837]    W:0x55c09ea441e0 CLIENT [ipv4:217.xxx.xxx.xxx:443] state WRITING -> IDLE [finish writing]
I0901 21:30:08.962717620      17 stream_lists.cc:71]         0x55c09ea441e0[13][cli]: pop from writing
I0901 21:30:08.965464425      17 chttp2_transport.cc:2676]   ipv4:217.xxx.xxx.xxx:443: Complete BDP ping err="No Error"
I0901 21:30:08.966951846      17 parsing.cc:681]             parsing initial_metadata
I0901 21:30:08.967006347      17 parsing.cc:430]             HTTP:13:HDR:CLI: :status: 32 30 30 '200'
I0901 21:30:08.967036824      17 parsing.cc:430]             HTTP:13:HDR:CLI: server: 6e 67 69 6e 78 2f 31 2e 31 39 2e 34 'nginx/1.19.4'
I0901 21:30:08.967064086      17 parsing.cc:430]             HTTP:13:HDR:CLI: date: 57 65 64 2c 20 30 31 20 53 65 70 20 32 30 32 31 20 31 39 3a 33 30 3a 30 38 20 47 4d 54 'Wed, 01 Sep 2021 19:30:08 GMT'
I0901 21:30:08.967099902      17 parsing.cc:430]             HTTP:13:HDR:CLI: content-type: 61 70 70 6c 69 63 61 74 69 6f 6e 2f 67 72 70 63 'application/grpc'
I0901 21:30:08.967122816      17 parsing.cc:430]             HTTP:13:HDR:CLI: grpc-encoding: 69 64 65 6e 74 69 74 79 'identity'
I0901 21:30:08.967146960      17 parsing.cc:430]             HTTP:13:HDR:CLI: grpc-accept-encoding: 67 7a 69 70 'gzip'
I0901 21:30:08.967179370      17 parsing.cc:686]             parsing trailing_metadata
I0901 21:30:08.967200200      17 parsing.cc:541]             HTTP:13:TRL:CLI: grpc-status: 30 '0'
D0901 21:30:08.967248871      17 call.cc:733]                set_final_status CLI
D0901 21:30:08.967272556      17 call.cc:734]                "No Error"

[...]

I0901 21:30:38.969339709      17 chttp2_transport.cc:837]    W:0x55c09ea441e0 CLIENT [ipv4:217.xxx.xxx.xxx:443] state IDLE -> WRITING [KEEPALIVE_PING]
I0901 21:30:38.969419709      17 writing.cc:116]             CLIENT: Ping sent [ipv4:217.xxx.xxx.xxx:443]: 2/2
I0901 21:30:38.969442792      17 chttp2_transport.cc:837]    W:0x55c09ea441e0 CLIENT [ipv4:217.xxx.xxx.xxx:443] state WRITING -> WRITING [begin write in current thread]
I0901 21:30:38.969609774      17 chttp2_transport.cc:2660]   ipv4:217.xxx.xxx.xxx:443: Start BDP ping err="No Error"
I0901 21:30:38.969654267      17 chttp2_transport.cc:2808]   ipv4:217.xxx.xxx.xxx:443: Start keepalive ping
I0901 21:30:38.969682520      17 chttp2_transport.cc:837]    W:0x55c09ea441e0 CLIENT [ipv4:217.xxx.xxx.xxx:443] state WRITING -> IDLE [finish writing]
I0901 21:30:38.974030027      17 chttp2_transport.cc:2676]   ipv4:217.xxx.xxx.xxx:443: Complete BDP ping err="No Error"
I0901 21:30:38.974080853      17 chttp2_transport.cc:2821]   ipv4:217.xxx.xxx.xxx:443: Finish keepalive ping
I0901 21:31:08.975564409      17 chttp2_transport.cc:837]    W:0x55c09ea441e0 CLIENT [ipv4:217.xxx.xxx.xxx:443] state IDLE -> WRITING [KEEPALIVE_PING]
I0901 21:31:08.975652725      17 writing.cc:89]              CLIENT: Ping delayed [0x55c09e3eb940]: not enough time elapsed since last ping.  Last ping 29677488.000000: Next ping 29977488.000000: Now 29707495.000000
I0901 21:31:08.975676960      17 chttp2_transport.cc:837]    W:0x55c09ea441e0 CLIENT [ipv4:217.xxx.xxx.xxx:443] state WRITING -> IDLE [begin writing nothing]
I0901 21:33:28.638315000      17 chttp2_transport.cc:837]    W:0x55c09ea441e0 CLIENT [ipv4:217.xxx.xxx.xxx:443] state IDLE -> WRITING [RETRY_SEND_PING]
I0901 21:33:28.638390120      17 writing.cc:89]              CLIENT: Ping delayed [0x55c09e3eb940]: not enough time elapsed since last ping.  Last ping 29677488.000000: Next ping 29977488.000000: Now 29847157.000000
I0901 21:33:28.638420257      17 chttp2_transport.cc:837]    W:0x55c09ea441e0 CLIENT [ipv4:217.xxx.xxx.xxx:443] state WRITING -> IDLE [begin writing nothing]
I0901 21:33:38.962724962      17 chttp2_transport.cc:1168]   transport 0x55c09ea441e0 got goaway with last stream id 13
I0901 21:33:38.962792107      17 chttp2_transport.cc:2866]   transport 0x55c09ea441e0 set connectivity_state=3
I0901 21:33:38.962810391      17 connectivity_state.cc:147]  SET: 0x55c09ea44488 client_transport: READY --> TRANSIENT_FAILURE [got_goaway]
I0901 21:33:38.962826772      17 connectivity_state.cc:160]  NOTIFY: 0x55c09ea44488 client_transport: 0x55c09e75d8a8
I0901 21:33:38.962846419      17 subchannel.cc:343]          Connected subchannel 0x55c09e8af200 of subchannel 0x55c09e3f6100 has gone into TRANSIENT_FAILURE. Attempting to reconnect.
D0901 21:33:38.962884801      17 dns_resolver.cc:242]        Start resolving.
I0901 21:33:38.962931658      17 connectivity_state.cc:147]  SET: 0x55c09e555880 client_channel: READY --> IDLE [helper]
I0901 21:33:38.963036725      17 chttp2_transport.cc:2866]   transport 0x55c09ea441e0 set connectivity_state=4
I0901 21:33:38.963059087      17 connectivity_state.cc:147]  SET: 0x55c09ea44488 client_transport: TRANSIENT_FAILURE --> SHUTDOWN [close_transport]
I0901 21:33:38.963094092      17 chttp2_transport.cc:2660]   ipv4:217.xxx.xxx.xxx:443: Start BDP ping err={"created":"@1630524818.963029091","description":"Transport destroyed","file":"../deps/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":567,"occurred_during_write":0}
I0901 21:33:38.963113248      17 chttp2_transport.cc:2676]   ipv4:217.xxx.xxx.xxx:443: Complete BDP ping err={"created":"@1630524818.963029091","description":"Transport destroyed","file":"../deps/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc","file_line":567,"occurred_during_write":0}

Keepalive settings on both server and client are:

// Server (java)
nettyServerBuilder.keepAliveTime(30, TimeUnit.SECONDS);
nettyServerBuilder.keepAliveTimeout(30 * 2, TimeUnit.SECONDS);
nettyServerBuilder.permitKeepAliveWithoutCalls(true);
nettyServerBuilder.permitKeepAliveTime(30 / 2, TimeUnit.SECONDS);

// Client (grpc-node)
'grpc.keepalive_time_ms': 30 * 1000,
'grpc.keepalive_timeout_ms': 30 * 2 * 1000,
'grpc.keepalive_permit_without_calls': 1

Do I interpret it correctly that there was a connection error (or GOAWAY got goaway with last stream id 13 ?) during ping and grpc closed the stream? But in my node app I did not receive any event and I listen and log any error, pause, close, end events (and yes this does work for most of the other connectivity errors, and also for GOAWAY, as I see it in the logs if I reduce ping interval on client side to does not match server, the error callback gets called).

Is this a bug of grpc-node or is there any way I can react to this error and rethrow it to foce app restart/reconnect @murgatroid99 ?

Patrick-Remy avatar Sep 01 '21 20:09 Patrick-Remy

A GOAWAY is not a connection error. It is literally just the server telling the client to go away. It could be communicating an error, or it could just mean that the server is shutting down and won't accept any new requests. There are no user-visible stream events for this because this is not a stream-level event, it is a connection-level event. If a server is shutting down gracefully, it can stop clients from creating new streams, but existing streams can continue communicating, so in that case you wouldn't see any stream errors at all.

I don't see any indication in that log that the GOAWAY was related to the ping, or that it was an error. In fact, it doesn't even look like there was a pending ping at all at the time that the client received the GOAWAY: there's a "Start keepalive ping" and then a few lines later there's a "Finish keepalive ping", and then there's no new "Start keepalive ping" before the GOAWAY.

As for reconnection, that happens automatically. If the call ends with an error because the connection has been dropped and you make a new request, it will use the new connection.

murgatroid99 avatar Sep 01 '21 21:09 murgatroid99

First of all, thanks for your fast reply and detailed response. That's cleared a bit my understanding.

As for reconnection, that happens automatically. If the call ends with an error because the connection has been dropped and you make a new request, it will use the new connection.

Okay, but in my scenario there aren't direct self-initiated calls from the client-side, instead there is only a long-living stream (or multiple of them) with quittance calls as response (so indirect calls only). Therefor, since my provided logs the stream did not receive any further events automatically (I just restarted one of two redundant clients to be sure and the restarted-one does receive many events again).

So how can I track this (possibly graceful) connection drop / GOAWAY? Rarely I encounter UNAVAILABLE: GOAWAY in response to quittance calls and they are send as error in callback:

Error: 14 UNAVAILABLE: GOAWAY received
      at Object.exports.createStatusError (/app/node_modules/grpc/src/common.js:91:15)
      at Object.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:1209:28)
      at InterceptingListener._callNext (/app/node_modules/grpc/src/client_interceptors.js:568:42)
      at InterceptingListener.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:618:8)
      at callback (/app/node_modules/grpc/src/client_interceptors.js:847:24) {
    code: 14,
    metadata: Metadata { _internal_repr: {}, flags: 0 },
    details: 'GOAWAY received'
  }

Isn't there something similar? I saw in the code that there is sometimes status event emitted, I'll try to listen on it next.

Patrick-Remy avatar Sep 02 '21 06:09 Patrick-Remy

in my scenario there aren't direct self-initiated calls from the client-side, instead there is only a long-living stream

A long-lived stream is a call from the client side, it's just longer than others.

with quittance calls as response (so indirect calls only)

I don't know what this means.

So how can I track this (possibly graceful) connection drop / GOAWAY?

There are two possibilities. The first is that the server keeps the stream alive and keeps responding. In that case, you don't need to do anything. The second possibility is that the server ends the stream, and the client needs to create a new one. The status event will tell you when the server ends the call, so you should listen for that. You should also have an error event listener, because if you don't it will throw an error. Those two events fire at mostly the same time, except that error will only be emitted if the call ends with an error, and status will be emitted for success and for errors.

Fundamentally, the GOAWAY is irrelevant. What matters is what is happening with each individual stream. A stream can end for any number of reasons, and your client should be prepared to handle that.

murgatroid99 avatar Sep 02 '21 16:09 murgatroid99

A long-lived stream is a call from the client side, it's just longer than others. I don't know what this means.

To simplify what I wanted to say: the client does simply start a stream and internally processes the incoming events, no further calls (except stream start) are done. You can forget what I said about quittance. So if a connection get dropped, currently the client does't do any request so that would initiate a reconnect, in response to:

and you make a new request, it will use the new connection.

The first is that the server keeps the stream alive and keeps responding. In that case, you don't need to do anything.

Unfortunately I do not have control over the servers implementation.

The status event will tell you when the server ends the call, so you should listen for that. You should also have an error event listener, because if you don't it will throw an error. Those two events fire at mostly the same time, except that error will only be emitted if the call ends with an error, and status will be emitted for success and for errors.

I did now implement a listener to status, in first tests it gets called if connection is canceled by myself. I will report if it also gets called on those weird connection 'freezes'. The error listener was implemented from the beginning and I searched older logs and it have already been called in other circumstances like 14 UNAVAILABLE: keepalive watchdog timeout or 14 UNAVAILABLE: TCP Read failed, but not after the GOAWAY.

I am not very familiar with the implementation details, but if I look to the source of _emitStatusIfDone, as a question why it is relevant that this.read_status AND this.received_status is set. Could it be that if _receiveStatus() gets called, no event is further emitted, as this.read_status can be null?

Patrick-Remy avatar Sep 02 '21 17:09 Patrick-Remy

if a connection get dropped, currently the client does't do any request so that would initiate a reconnect

If the server ends your stream, you should start a new one if you want to continue to have an active stream.

Unfortunately I do not have control over the servers implementation.

I'm not saying that you should do anything on the server. I'm explaining what the client behavior should be, which depends on what the server does. If the server continues sending messages, the client doesn't need to do anything special, it will just keep getting messages. Otherwise, if the server ends the stream with a status (which you will see in the status event), the client will need to create a new stream to continue getting messages.

If the server both stops sending messages and does not end the stream with a status, that is misbehavior on the part of the server that the client cannot handle.

if I look to the source of _emitStatusIfDone, as a question why it is relevant that this.read_status AND this.received_status is set

The point of that is to ensure that all incoming messages have been processed before emitting the status. Lower layers use the status as a signal that all incoming messages have been received, so there should not be any code path that receives a status but never sets read_status.

murgatroid99 avatar Sep 02 '21 19:09 murgatroid99

Since @grpc/grpc-js 1.8.0, servers have support for keepalives, which can be configured with the options grpc.keepalive_time_ms and grpc.keepalive_timeout_ms.

murgatroid99 avatar Dec 14 '23 22:12 murgatroid99