grpc-node icon indicating copy to clipboard operation
grpc-node copied to clipboard

resolving_call doesn't end, grpc-js client freezes up the whole process

Open ravicious opened this issue 2 years ago • 3 comments

Problem description

I have an Electron app which creates a grpc-js client in a renderer process. The client communicates with a grpc-js server spun up in a separate, non-renderer process.

I have a very simple unary RPC which doesn't do anything complex on the server side: it creates a new EventEmitter and adds it to a Map that's accessible by the gRPC service.

Sometimes, but not always, when I have two calls to that RPC that are close in time to each other, the second one will never finish and hang up the whole renderer process, i.e. make it impossible to interact with the browser window of the app. The server process doesn't seem to hang, I verified that by adding a setInterval with console.log to the process which runs the server.

I suspect that it might be grpc-js's fault mostly because when the issue happens it's always the gRPC call that is last logged. I also found a couple of other issues mentioning grpc-js hanging up the process.

Reproduction steps

I'm sorry but it's hard to specify what exactly the reproduction steps are. The app makes two concurrent calls to the same RPC and appears to hang. I think the logs might tell a better story of what's happening.

I don't use TLS, the client communicates with the server over a socket.

Edit: I might have some reproduction steps tomorrow, as I've just realized how to reliably reproduce it in our open source app.

Environment

  • OS name, version and architecture: macOS 13.3.1 arm64
  • Node version: 18.14.0
  • Node installation method: shipped with Electron 24.3.0
  • Package name and version: [email protected]

Additional context

I have client and server logs from the moment that the first call to the RPC happens.

The only difference that I managed to spot between the two is that when the client hangs, grpc-message from the first call is empty:

2023-05-18T16:48:41.821Z | subchannel_call | [35] Received server trailers:
		grpc-status: 0
		grpc-message: 

The logs from the hanging client also include this:

2023-05-18T16:48:41.876Z | subchannel_call | [39] received status details string "OK" from server
2023-05-18T16:48:41.876Z | subchannel_call | [39] close http2 stream with code 0

but are missing this part which is present when everything goes okay:

2023-05-18T16:53:01.587Z | resolving_call | [37] Received status
2023-05-18T16:53:01.587Z | resolving_call | [37] ended with status: code=0 details="OK"
Logs when the client hangs

Server logs

2023-05-18T16:48:41.820Z | server | (1) Received call to method /PtyHost/CreatePtyProcess at address null
2023-05-18T16:48:41.821Z | server_call | Request to /PtyHost/CreatePtyProcess received headers {"grpc-accept-encoding":["identity,deflate,gzip"],"accept-encoding":["identity"],"user-agent":["grpc-node-js/1.8.14"],"content-type":["application/grpc"],"te":["trailers"]}
2023-05-18T16:48:41.825Z | server_call | Request to method /PtyHost/CreatePtyProcess ended with status code: OK details: OK
2023-05-18T16:48:41.826Z | server_call | Request to method /PtyHost/CreatePtyProcess stream closed with rstCode 0
2023-05-18T16:48:41.876Z | server | (1) Received call to method /PtyHost/CreatePtyProcess at address null
2023-05-18T16:48:41.876Z | server_call | Request to /PtyHost/CreatePtyProcess received headers {"grpc-accept-encoding":["identity,deflate,gzip"],"accept-encoding":["identity"],"user-agent":["grpc-node-js/1.8.14"],"content-type":["application/grpc"],"te":["trailers"]}

Client logs

2023-05-18T16:48:40.007Z | index | Loading @grpc/grpc-js version 1.8.14
…
2023-05-18T16:48:41.814Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createResolvingCall [36] method="/PtyHost/CreatePtyProcess", deadline=Infinity
2023-05-18T16:48:41.814Z | resolving_call | [36] Created
2023-05-18T16:48:41.814Z | resolving_call | [36] Deadline: Infinity
2023-05-18T16:48:41.814Z | resolving_call | [36] start called
2023-05-18T16:48:41.814Z | resolving_load_balancer | unix:///Users/rav/Library/Application Support/Electron/shared.socket IDLE -> CONNECTING
2023-05-18T16:48:41.814Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
2023-05-18T16:48:41.814Z | connectivity_state | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket IDLE -> CONNECTING
2023-05-18T16:48:41.815Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.ref | configSelectionQueue.length=1 pickQueue.length=0
2023-05-18T16:48:41.815Z | resolving_call | [36] startRead called
2023-05-18T16:48:41.816Z | resolving_call | [36] write() called with message of length 10585
2023-05-18T16:48:41.816Z | resolving_call | [36] halfClose called
2023-05-18T16:48:41.816Z | pick_first | Connect to address list /Users/rav/Library/Application Support/Electron/shared.socket
2023-05-18T16:48:41.816Z | subchannel | (8) /Users/rav/Library/Application Support/Electron/shared.socket Subchannel constructed with options {}
2023-05-18T16:48:41.816Z | subchannel_refcount | (8) /Users/rav/Library/Application Support/Electron/shared.socket refcount 0 -> 1
2023-05-18T16:48:41.816Z | subchannel_refcount | (8) /Users/rav/Library/Application Support/Electron/shared.socket refcount 1 -> 2
2023-05-18T16:48:41.816Z | pick_first | Start connecting to subchannel with address /Users/rav/Library/Application Support/Electron/shared.socket
2023-05-18T16:48:41.816Z | pick_first | IDLE -> CONNECTING
2023-05-18T16:48:41.816Z | resolving_load_balancer | unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> CONNECTING
2023-05-18T16:48:41.816Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
2023-05-18T16:48:41.816Z | connectivity_state | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> CONNECTING
2023-05-18T16:48:41.817Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
2023-05-18T16:48:41.817Z | subchannel | (8) /Users/rav/Library/Application Support/Electron/shared.socket IDLE -> CONNECTING
2023-05-18T16:48:41.817Z | pick_first | CONNECTING -> CONNECTING
2023-05-18T16:48:41.817Z | resolving_load_balancer | unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> CONNECTING
2023-05-18T16:48:41.817Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
2023-05-18T16:48:41.817Z | connectivity_state | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> CONNECTING
2023-05-18T16:48:41.817Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createRetryingCall [37] method="/PtyHost/CreatePtyProcess"
2023-05-18T16:48:41.817Z | resolving_call | [36] Created child [37]
2023-05-18T16:48:41.817Z | retrying_call | [37] start called
2023-05-18T16:48:41.817Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createLoadBalancingCall [38] method="/PtyHost/CreatePtyProcess"
2023-05-18T16:48:41.817Z | retrying_call | [37] Created child call [38] for attempt 1
2023-05-18T16:48:41.817Z | load_balancing_call | [38] start called
2023-05-18T16:48:41.817Z | load_balancing_call | [38] Pick called
2023-05-18T16:48:41.817Z | load_balancing_call | [38] Pick result: QUEUE subchannel: null status: undefined undefined
2023-05-18T16:48:41.818Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
2023-05-18T16:48:41.818Z | retrying_call | [37] startRead called
2023-05-18T16:48:41.818Z | load_balancing_call | [38] startRead called
2023-05-18T16:48:41.818Z | retrying_call | [37] write() called with message of length 10590
2023-05-18T16:48:41.818Z | load_balancing_call | [38] write() called with message of length 10590
2023-05-18T16:48:41.818Z | retrying_call | [37] halfClose called
2023-05-18T16:48:41.819Z | subchannel | (8) /Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> READY
2023-05-18T16:48:41.819Z | pick_first | Pick subchannel with address /Users/rav/Library/Application Support/Electron/shared.socket
2023-05-18T16:48:41.819Z | subchannel_refcount | (8) /Users/rav/Library/Application Support/Electron/shared.socket refcount 2 -> 3
2023-05-18T16:48:41.819Z | subchannel_refcount | (8) /Users/rav/Library/Application Support/Electron/shared.socket refcount 3 -> 2
2023-05-18T16:48:41.819Z | pick_first | CONNECTING -> READY
2023-05-18T16:48:41.819Z | resolving_load_balancer | unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> READY
2023-05-18T16:48:41.819Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
2023-05-18T16:48:41.819Z | load_balancing_call | [38] Pick called
2023-05-18T16:48:41.819Z | load_balancing_call | [38] Pick result: COMPLETE subchannel: (8) /Users/rav/Library/Application Support/Electron/shared.socket status: undefined undefined
2023-05-18T16:48:41.819Z | connectivity_state | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> READY
2023-05-18T16:48:41.819Z | transport_flowctrl | (9) /Users/rav/Library/Application Support/Electron/shared.socket local window size: 65535 remote window size: 65535
2023-05-18T16:48:41.819Z | transport_internals | (9) /Users/rav/Library/Application Support/Electron/shared.socket session.closed=false session.destroyed=false session.socket.destroyed=false
2023-05-18T16:48:41.819Z | load_balancing_call | [38] Created child call [39]
2023-05-18T16:48:41.819Z | subchannel_call | [39] write() called with message of length 10590
2023-05-18T16:48:41.819Z | subchannel_call | [39] sending data chunk of length 10590
2023-05-18T16:48:41.820Z | load_balancing_call | [38] halfClose called
2023-05-18T16:48:41.820Z | subchannel_call | [39] end() called
2023-05-18T16:48:41.820Z | subchannel_call | [39] calling end() on HTTP/2 stream
2023-05-18T16:48:41.820Z | subchannel_call | [35] Received server headers:
		:status: 200
		content-type: application/grpc
2023-05-18T16:48:41.820Z | load_balancing_call | [33] Received metadata
2023-05-18T16:48:41.820Z | retrying_call | [32] Received metadata from child [33]
2023-05-18T16:48:41.820Z | retrying_call | [32] Committing call [33] at index 0
2023-05-18T16:48:41.820Z | resolving_call | [29] Received metadata
2023-05-18T16:48:41.820Z | subchannel_call | [35] receive HTTP/2 data frame of length 334
2023-05-18T16:48:41.820Z | subchannel_call | [35] parsed message of length 334
2023-05-18T16:48:41.820Z | subchannel_call | [35] pushing to reader message of length 334
2023-05-18T16:48:41.820Z | load_balancing_call | [33] Received message
2023-05-18T16:48:41.820Z | retrying_call | [32] Received message from child [33]
2023-05-18T16:48:41.820Z | resolving_call | [29] Received message
2023-05-18T16:48:41.820Z | resolving_call | [29] Finished filtering received message
2023-05-18T16:48:41.821Z | subchannel_call | [35] Received server trailers:
		grpc-status: 0
		grpc-message: 
2023-05-18T16:48:41.821Z | subchannel_call | [35] received status code 0 from server
2023-05-18T16:48:41.821Z | subchannel_call | [35] received status details string "" from server
2023-05-18T16:48:41.821Z | subchannel_call | [35] close http2 stream with code 0
2023-05-18T16:48:41.821Z | subchannel_call | [35] ended with status: code=0 details=""
2023-05-18T16:48:41.821Z | load_balancing_call | [33] Received status
2023-05-18T16:48:41.821Z | load_balancing_call | [33] ended with status: code=0 details=""
2023-05-18T16:48:41.821Z | retrying_call | [32] Received status from child [33]
2023-05-18T16:48:41.821Z | retrying_call | [32] state=COMMITTED handling status with progress PROCESSED from child [33] in state ACTIVE
2023-05-18T16:48:41.821Z | retrying_call | [32] ended with status: code=0 details=""
2023-05-18T16:48:41.821Z | resolving_call | [29] Received status
2023-05-18T16:48:41.822Z | resolving_call | [29] ended with status: code=0 details=""
2023-05-18T16:48:41.822Z | subchannel_call | [35] HTTP/2 stream closed with code 0
2023-05-18T16:48:41.867Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createResolvingCall [40] method="/PtyHost/CreatePtyProcess", deadline=Infinity
2023-05-18T16:48:41.867Z | resolving_call | [40] Created
2023-05-18T16:48:41.867Z | resolving_call | [40] Deadline: Infinity
2023-05-18T16:48:41.867Z | resolving_call | [40] start called
2023-05-18T16:48:41.868Z | resolving_call | [40] startRead called
2023-05-18T16:48:41.868Z | resolving_call | [40] write() called with message of length 10662
2023-05-18T16:48:41.868Z | resolving_call | [40] halfClose called
2023-05-18T16:48:41.869Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createRetryingCall [41] method="/PtyHost/CreatePtyProcess"
2023-05-18T16:48:41.869Z | resolving_call | [40] Created child [41]
2023-05-18T16:48:41.869Z | retrying_call | [41] start called
2023-05-18T16:48:41.869Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createLoadBalancingCall [42] method="/PtyHost/CreatePtyProcess"
2023-05-18T16:48:41.869Z | retrying_call | [41] Created child call [42] for attempt 1
2023-05-18T16:48:41.869Z | load_balancing_call | [42] start called
2023-05-18T16:48:41.869Z | load_balancing_call | [42] Pick called
2023-05-18T16:48:41.870Z | load_balancing_call | [42] Pick result: COMPLETE subchannel: (8) /Users/rav/Library/Application Support/Electron/shared.socket status: undefined undefined
2023-05-18T16:48:41.870Z | retrying_call | [41] startRead called
2023-05-18T16:48:41.870Z | load_balancing_call | [42] startRead called
2023-05-18T16:48:41.870Z | transport_flowctrl | (9) /Users/rav/Library/Application Support/Electron/shared.socket local window size: 65535 remote window size: 54945
2023-05-18T16:48:41.870Z | transport_internals | (9) /Users/rav/Library/Application Support/Electron/shared.socket session.closed=false session.destroyed=false session.socket.destroyed=false
2023-05-18T16:48:41.870Z | load_balancing_call | [42] Created child call [43]
2023-05-18T16:48:41.870Z | retrying_call | [41] write() called with message of length 10667
2023-05-18T16:48:41.870Z | load_balancing_call | [42] write() called with message of length 10667
2023-05-18T16:48:41.871Z | subchannel_call | [43] write() called with message of length 10667
2023-05-18T16:48:41.871Z | subchannel_call | [43] sending data chunk of length 10667
2023-05-18T16:48:41.871Z | retrying_call | [41] halfClose called
2023-05-18T16:48:41.875Z | transport | (9) /Users/rav/Library/Application Support/Electron/shared.socket local settings acknowledged by remote: {"headerTableSize":4096,"enablePush":true,"initialWindowSize":65535,"maxFrameSize":16384,"maxConcurrentStreams":4294967295,"maxHeaderListSize":4294967295,"maxHeaderSize":4294967295,"enableConnectProtocol":false}
2023-05-18T16:48:41.875Z | subchannel_call | [39] Received server headers:
		:status: 200
		content-type: application/grpc+proto
		grpc-accept-encoding: identity,deflate,gzip
		grpc-encoding: identity
		date: Thu, 18 May 2023 16:48:41 GMT
2023-05-18T16:48:41.875Z | load_balancing_call | [38] Received metadata
2023-05-18T16:48:41.875Z | retrying_call | [37] Received metadata from child [38]
2023-05-18T16:48:41.875Z | retrying_call | [37] Committing call [38] at index 0
2023-05-18T16:48:41.875Z | resolving_call | [36] Received metadata
2023-05-18T16:48:41.875Z | subchannel_call | [39] receive HTTP/2 data frame of length 28
2023-05-18T16:48:41.875Z | subchannel_call | [39] parsed message of length 28
2023-05-18T16:48:41.875Z | subchannel_call | [39] pushing to reader message of length 28
2023-05-18T16:48:41.875Z | load_balancing_call | [38] Received message
2023-05-18T16:48:41.875Z | retrying_call | [37] Received message from child [38]
2023-05-18T16:48:41.875Z | resolving_call | [36] Received message
2023-05-18T16:48:41.876Z | resolving_call | [36] Finished filtering received message
2023-05-18T16:48:41.876Z | subchannel_call | [39] Received server trailers:
		grpc-status: 0
		grpc-message: OK
2023-05-18T16:48:41.876Z | subchannel_call | [39] received status code 0 from server
2023-05-18T16:48:41.876Z | subchannel_call | [39] received status details string "OK" from server
2023-05-18T16:48:41.876Z | subchannel_call | [39] close http2 stream with code 0
Logs when the client doesn't hang

Server logs

2023-05-18T16:53:01.575Z | server | (1) Received call to method /PtyHost/CreatePtyProcess at address null
2023-05-18T16:53:01.576Z | server_call | Request to /PtyHost/CreatePtyProcess received headers {"grpc-accept-encoding":["identity,deflate,gzip"],"accept-encoding":["identity"],"user-agent":["grpc-node-js/1.8.14"],"content-type":["application/grpc"],"te":["trailers"]}
2023-05-18T16:53:01.576Z | server | (1) Received call to method /PtyHost/CreatePtyProcess at address null
2023-05-18T16:53:01.576Z | server_call | Request to /PtyHost/CreatePtyProcess received headers {"grpc-accept-encoding":["identity,deflate,gzip"],"accept-encoding":["identity"],"user-agent":["grpc-node-js/1.8.14"],"content-type":["application/grpc"],"te":["trailers"]}
2023-05-18T16:53:01.581Z | server_call | Request to method /PtyHost/CreatePtyProcess ended with status code: OK details: OK
2023-05-18T16:53:01.584Z | server_call | Request to method /PtyHost/CreatePtyProcess ended with status code: OK details: OK
2023-05-18T16:53:01.585Z | server_call | Request to method /PtyHost/CreatePtyProcess stream closed with rstCode 0
2023-05-18T16:53:01.585Z | server_call | Request to method /PtyHost/CreatePtyProcess stream closed with rstCode 0

Client logs

2023-05-18T16:53:01.565Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createResolvingCall [36] method="/PtyHost/CreatePtyProcess", deadline=Infinity
2023-05-18T16:53:01.566Z | resolving_call | [36] Created
2023-05-18T16:53:01.566Z | resolving_call | [36] Deadline: Infinity
2023-05-18T16:53:01.566Z | resolving_call | [36] start called
2023-05-18T16:53:01.566Z | resolving_load_balancer | unix:///Users/rav/Library/Application Support/Electron/shared.socket IDLE -> CONNECTING
2023-05-18T16:53:01.566Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
2023-05-18T16:53:01.566Z | connectivity_state | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket IDLE -> CONNECTING
2023-05-18T16:53:01.566Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.ref | configSelectionQueue.length=1 pickQueue.length=0
2023-05-18T16:53:01.566Z | resolving_call | [36] startRead called
2023-05-18T16:53:01.568Z | resolving_call | [36] write() called with message of length 10585
2023-05-18T16:53:01.568Z | resolving_call | [36] halfClose called
2023-05-18T16:53:01.568Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createResolvingCall [37] method="/PtyHost/CreatePtyProcess", deadline=Infinity
2023-05-18T16:53:01.568Z | resolving_call | [37] Created
2023-05-18T16:53:01.568Z | resolving_call | [37] Deadline: Infinity
2023-05-18T16:53:01.568Z | resolving_call | [37] start called
2023-05-18T16:53:01.569Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.ref | configSelectionQueue.length=2 pickQueue.length=0
2023-05-18T16:53:01.569Z | resolving_call | [37] startRead called
2023-05-18T16:53:01.569Z | resolving_call | [37] write() called with message of length 10662
2023-05-18T16:53:01.569Z | resolving_call | [37] halfClose called
2023-05-18T16:53:01.570Z | pick_first | Connect to address list /Users/rav/Library/Application Support/Electron/shared.socket
2023-05-18T16:53:01.570Z | subchannel | (8) /Users/rav/Library/Application Support/Electron/shared.socket Subchannel constructed with options {}
2023-05-18T16:53:01.570Z | subchannel_refcount | (8) /Users/rav/Library/Application Support/Electron/shared.socket refcount 0 -> 1
2023-05-18T16:53:01.570Z | subchannel_refcount | (8) /Users/rav/Library/Application Support/Electron/shared.socket refcount 1 -> 2
2023-05-18T16:53:01.570Z | pick_first | Start connecting to subchannel with address /Users/rav/Library/Application Support/Electron/shared.socket
2023-05-18T16:53:01.570Z | pick_first | IDLE -> CONNECTING
2023-05-18T16:53:01.570Z | resolving_load_balancer | unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> CONNECTING
2023-05-18T16:53:01.570Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.unref | configSelectionQueue.length=2 pickQueue.length=0
2023-05-18T16:53:01.570Z | connectivity_state | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> CONNECTING
2023-05-18T16:53:01.570Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
2023-05-18T16:53:01.570Z | subchannel | (8) /Users/rav/Library/Application Support/Electron/shared.socket IDLE -> CONNECTING
2023-05-18T16:53:01.570Z | pick_first | CONNECTING -> CONNECTING
2023-05-18T16:53:01.570Z | resolving_load_balancer | unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> CONNECTING
2023-05-18T16:53:01.570Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
2023-05-18T16:53:01.570Z | connectivity_state | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> CONNECTING
2023-05-18T16:53:01.571Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createRetryingCall [38] method="/PtyHost/CreatePtyProcess"
2023-05-18T16:53:01.571Z | resolving_call | [36] Created child [38]
2023-05-18T16:53:01.571Z | retrying_call | [38] start called
2023-05-18T16:53:01.571Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createLoadBalancingCall [39] method="/PtyHost/CreatePtyProcess"
2023-05-18T16:53:01.571Z | retrying_call | [38] Created child call [39] for attempt 1
2023-05-18T16:53:01.571Z | load_balancing_call | [39] start called
2023-05-18T16:53:01.571Z | load_balancing_call | [39] Pick called
2023-05-18T16:53:01.571Z | load_balancing_call | [39] Pick result: QUEUE subchannel: null status: undefined undefined
2023-05-18T16:53:01.571Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
2023-05-18T16:53:01.571Z | retrying_call | [38] startRead called
2023-05-18T16:53:01.571Z | load_balancing_call | [39] startRead called
2023-05-18T16:53:01.571Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createRetryingCall [40] method="/PtyHost/CreatePtyProcess"
2023-05-18T16:53:01.571Z | resolving_call | [37] Created child [40]
2023-05-18T16:53:01.571Z | retrying_call | [40] start called
2023-05-18T16:53:01.571Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket createLoadBalancingCall [41] method="/PtyHost/CreatePtyProcess"
2023-05-18T16:53:01.571Z | retrying_call | [40] Created child call [41] for attempt 1
2023-05-18T16:53:01.571Z | load_balancing_call | [41] start called
2023-05-18T16:53:01.571Z | load_balancing_call | [41] Pick called
2023-05-18T16:53:01.572Z | load_balancing_call | [41] Pick result: QUEUE subchannel: null status: undefined undefined
2023-05-18T16:53:01.572Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=2
2023-05-18T16:53:01.572Z | retrying_call | [40] startRead called
2023-05-18T16:53:01.572Z | load_balancing_call | [41] startRead called
2023-05-18T16:53:01.572Z | retrying_call | [38] write() called with message of length 10590
2023-05-18T16:53:01.572Z | load_balancing_call | [39] write() called with message of length 10590
2023-05-18T16:53:01.572Z | retrying_call | [38] halfClose called
2023-05-18T16:53:01.572Z | retrying_call | [40] write() called with message of length 10667
2023-05-18T16:53:01.572Z | load_balancing_call | [41] write() called with message of length 10667
2023-05-18T16:53:01.572Z | retrying_call | [40] halfClose called
2023-05-18T16:53:01.573Z | subchannel | (8) /Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> READY
2023-05-18T16:53:01.573Z | pick_first | Pick subchannel with address /Users/rav/Library/Application Support/Electron/shared.socket
2023-05-18T16:53:01.573Z | subchannel_refcount | (8) /Users/rav/Library/Application Support/Electron/shared.socket refcount 2 -> 3
2023-05-18T16:53:01.573Z | subchannel_refcount | (8) /Users/rav/Library/Application Support/Electron/shared.socket refcount 3 -> 2
2023-05-18T16:53:01.574Z | pick_first | CONNECTING -> READY
2023-05-18T16:53:01.574Z | resolving_load_balancer | unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> READY
2023-05-18T16:53:01.574Z | channel | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
2023-05-18T16:53:01.574Z | load_balancing_call | [39] Pick called
2023-05-18T16:53:01.574Z | load_balancing_call | [39] Pick result: COMPLETE subchannel: (8) /Users/rav/Library/Application Support/Electron/shared.socket status: undefined undefined
2023-05-18T16:53:01.574Z | load_balancing_call | [41] Pick called
2023-05-18T16:53:01.574Z | load_balancing_call | [41] Pick result: COMPLETE subchannel: (8) /Users/rav/Library/Application Support/Electron/shared.socket status: undefined undefined
2023-05-18T16:53:01.574Z | connectivity_state | (2) unix:///Users/rav/Library/Application Support/Electron/shared.socket CONNECTING -> READY
2023-05-18T16:53:01.574Z | transport_flowctrl | (9) /Users/rav/Library/Application Support/Electron/shared.socket local window size: 65535 remote window size: 65535
2023-05-18T16:53:01.574Z | transport_internals | (9) /Users/rav/Library/Application Support/Electron/shared.socket session.closed=false session.destroyed=false session.socket.destroyed=false
2023-05-18T16:53:01.574Z | load_balancing_call | [39] Created child call [42]
2023-05-18T16:53:01.574Z | subchannel_call | [42] write() called with message of length 10590
2023-05-18T16:53:01.574Z | subchannel_call | [42] sending data chunk of length 10590
2023-05-18T16:53:01.575Z | transport_flowctrl | (9) /Users/rav/Library/Application Support/Electron/shared.socket local window size: 65535 remote window size: 65535
2023-05-18T16:53:01.575Z | transport_internals | (9) /Users/rav/Library/Application Support/Electron/shared.socket session.closed=false session.destroyed=false session.socket.destroyed=false
2023-05-18T16:53:01.575Z | load_balancing_call | [41] Created child call [43]
2023-05-18T16:53:01.575Z | subchannel_call | [43] write() called with message of length 10667
2023-05-18T16:53:01.575Z | subchannel_call | [43] sending data chunk of length 10667
2023-05-18T16:53:01.576Z | load_balancing_call | [39] halfClose called
2023-05-18T16:53:01.576Z | subchannel_call | [42] end() called
2023-05-18T16:53:01.576Z | subchannel_call | [42] calling end() on HTTP/2 stream
2023-05-18T16:53:01.576Z | load_balancing_call | [41] halfClose called
2023-05-18T16:53:01.577Z | subchannel_call | [43] end() called
2023-05-18T16:53:01.577Z | subchannel_call | [43] calling end() on HTTP/2 stream
2023-05-18T16:53:01.577Z | transport | (9) /Users/rav/Library/Application Support/Electron/shared.socket local settings acknowledged by remote: {"headerTableSize":4096,"enablePush":true,"initialWindowSize":65535,"maxFrameSize":16384,"maxConcurrentStreams":4294967295,"maxHeaderListSize":4294967295,"maxHeaderSize":4294967295,"enableConnectProtocol":false}
2023-05-18T16:53:01.584Z | subchannel_call | [42] Received server headers:
		:status: 200
		content-type: application/grpc+proto
		grpc-accept-encoding: identity,deflate,gzip
		grpc-encoding: identity
		date: Thu, 18 May 2023 16:53:01 GMT
2023-05-18T16:53:01.584Z | load_balancing_call | [39] Received metadata
2023-05-18T16:53:01.584Z | retrying_call | [38] Received metadata from child [39]
2023-05-18T16:53:01.584Z | retrying_call | [38] Committing call [39] at index 0
2023-05-18T16:53:01.584Z | resolving_call | [36] Received metadata
2023-05-18T16:53:01.585Z | subchannel_call | [43] Received server headers:
		:status: 200
		content-type: application/grpc+proto
		grpc-accept-encoding: identity,deflate,gzip
		grpc-encoding: identity
		date: Thu, 18 May 2023 16:53:01 GMT
2023-05-18T16:53:01.585Z | load_balancing_call | [41] Received metadata
2023-05-18T16:53:01.585Z | retrying_call | [40] Received metadata from child [41]
2023-05-18T16:53:01.585Z | retrying_call | [40] Committing call [41] at index 0
2023-05-18T16:53:01.585Z | resolving_call | [37] Received metadata
2023-05-18T16:53:01.585Z | subchannel_call | [42] receive HTTP/2 data frame of length 28
2023-05-18T16:53:01.585Z | subchannel_call | [42] parsed message of length 28
2023-05-18T16:53:01.585Z | subchannel_call | [42] pushing to reader message of length 28
2023-05-18T16:53:01.585Z | load_balancing_call | [39] Received message
2023-05-18T16:53:01.585Z | retrying_call | [38] Received message from child [39]
2023-05-18T16:53:01.585Z | resolving_call | [36] Received message
2023-05-18T16:53:01.585Z | resolving_call | [36] Finished filtering received message
2023-05-18T16:53:01.585Z | subchannel_call | [43] receive HTTP/2 data frame of length 28
2023-05-18T16:53:01.585Z | subchannel_call | [43] parsed message of length 28
2023-05-18T16:53:01.585Z | subchannel_call | [43] pushing to reader message of length 28
2023-05-18T16:53:01.586Z | load_balancing_call | [41] Received message
2023-05-18T16:53:01.586Z | retrying_call | [40] Received message from child [41]
2023-05-18T16:53:01.586Z | resolving_call | [37] Received message
2023-05-18T16:53:01.586Z | resolving_call | [37] Finished filtering received message
2023-05-18T16:53:01.586Z | subchannel_call | [42] Received server trailers:
		grpc-status: 0
		grpc-message: OK
2023-05-18T16:53:01.586Z | subchannel_call | [42] received status code 0 from server
2023-05-18T16:53:01.586Z | subchannel_call | [42] received status details string "OK" from server
2023-05-18T16:53:01.586Z | subchannel_call | [42] close http2 stream with code 0
2023-05-18T16:53:01.586Z | subchannel_call | [43] Received server trailers:
		grpc-status: 0
		grpc-message: OK
2023-05-18T16:53:01.586Z | subchannel_call | [43] received status code 0 from server
2023-05-18T16:53:01.586Z | subchannel_call | [43] received status details string "OK" from server
2023-05-18T16:53:01.586Z | subchannel_call | [43] close http2 stream with code 0
2023-05-18T16:53:01.586Z | subchannel_call | [42] ended with status: code=0 details="OK"
2023-05-18T16:53:01.586Z | subchannel_call | [43] ended with status: code=0 details="OK"
2023-05-18T16:53:01.586Z | load_balancing_call | [39] Received status
2023-05-18T16:53:01.586Z | load_balancing_call | [39] ended with status: code=0 details="OK"
2023-05-18T16:53:01.586Z | retrying_call | [38] Received status from child [39]
2023-05-18T16:53:01.587Z | retrying_call | [38] state=COMMITTED handling status with progress PROCESSED from child [39] in state ACTIVE
2023-05-18T16:53:01.587Z | retrying_call | [38] ended with status: code=0 details="OK"
2023-05-18T16:53:01.587Z | load_balancing_call | [41] Received status
2023-05-18T16:53:01.587Z | load_balancing_call | [41] ended with status: code=0 details="OK"
2023-05-18T16:53:01.587Z | retrying_call | [40] Received status from child [41]
2023-05-18T16:53:01.587Z | retrying_call | [40] state=COMMITTED handling status with progress PROCESSED from child [41] in state ACTIVE
2023-05-18T16:53:01.587Z | retrying_call | [40] ended with status: code=0 details="OK"
2023-05-18T16:53:01.587Z | resolving_call | [36] Received status
2023-05-18T16:53:01.587Z | resolving_call | [36] ended with status: code=0 details="OK"
2023-05-18T16:53:01.587Z | subchannel_call | [42] HTTP/2 stream closed with code 0
2023-05-18T16:53:01.587Z | resolving_call | [37] Received status
2023-05-18T16:53:01.587Z | resolving_call | [37] ended with status: code=0 details="OK"
2023-05-18T16:53:01.587Z | subchannel_call | [43] HTTP/2 stream closed with code 0

ravicious avatar May 18 '23 17:05 ravicious

There are a couple of weird things in the "Logs when the client hangs":

  • The server log shows the server receiving two requests, but only responding to one of them. But the client receives two responses.
  • The headers and trailers are unusual for one of the calls in that log:
    2023-05-18T16:48:41.820Z | subchannel_call | [35] Received server headers:
     	:status: 200
     	content-type: application/grpc
    2023-05-18T16:48:41.821Z | subchannel_call | [35] Received server trailers:
     	grpc-status: 0
     	grpc-message: 
    
    vs all of the other logged calls:
    2023-05-18T16:48:41.875Z | subchannel_call | [39] Received server headers:
     	:status: 200
     	content-type: application/grpc+proto
     	grpc-accept-encoding: identity,deflate,gzip
     	grpc-encoding: identity
     	date: Thu, 18 May 2023 16:48:41 GMT
    2023-05-18T16:48:41.876Z | subchannel_call | [39] Received server trailers:
     	grpc-status: 0
     	grpc-message: OK
    
    However, the call with the strange headers is the one that completed. The other one didn't.

I've never seen anything like this before, so I don't know what could cause it. Have you tried running this in the main Electron process instead of the renderer process, and if so, do you still see the same hang? You didn't say what Electron version you're using, but if it's not the latest then upgrading it might help. We had a hang in our internal tests that was resolved by upgrading from Node 16 to 18.

If you can reproduce the hang again, it might be useful to attach a JS debugger to it, to see what is running at that time. If you can't attach a JS debugger, you should be able to attach gdb.

murgatroid99 avatar May 18 '23 20:05 murgatroid99

Thanks for the insight, I really appreciate it!

A couple of things I forgot to note yesterday:

  • Each CreatePtyProcess call sends a big google.protobuf.Struct with process.env. Due to some additional values added by yarn/npm, in my case it ends up having over 150 values in development.
  • I forgot that the app uses the Node.js version shipped with Electron, not the one I have installed locally. So it's Node.js v18.14.0 from Electron 24.3.0.

I also found a way to reliably reproduce it by pressing a keyboard shortcut a couple of times which causes the app to repeatedly send that RPC in quick succesion. Once I have time to come back to this, I should be able to write a simple Electron app which reproduces this behavior.

Have you tried running this in the main Electron process instead of the renderer process, and if so, do you still see the same hang?

I didn't but I'll try once I create that simple Electron app to reproduce the issue.

If you can reproduce the hang again, it might be useful to attach a JS debugger to it, to see what is running at that time.

How would I go about doing this? Since this is in the renderer process, I have access to the regular Chromium debugger. But I'm not sure how to use it when the process itself hangs. If I pause script execution while the process hangs, nothing really happens i.e. the debugger doesn't dump any information regarding the call stack.

ravicious avatar May 19 '23 08:05 ravicious

Since this is in the renderer process, I have access to the regular Chromium debugger. But I'm not sure how to use it when the process itself hangs. If I pause script execution while the process hangs, nothing really happens i.e. the debugger doesn't dump any information regarding the call stack.

In general, when a process "hangs", it is usually either deadlocked (all threads paused waiting on each other) or livelocked (some threads are working, but are stuck in a loop). Deadlocks are unusual in JavaScript because you rarely do cross-thread synchronization, so a livelock is more likely. In other words, some code is probably running. If the JS debugger isn't telling you anything, that might mean that it's not stuck in any JS code, but rather in some code in the runtime itself. That's what gdb might be able to help with: it should be able to get a call stack for that code.

murgatroid99 avatar May 19 '23 17:05 murgatroid99