visibility into client-side queuing and flow control
Outgoing messages in a call can be internally queued by gRPC due to HTTP2 flow control. Client calls can additionally be delayed from even starting due to the underlying HTTP2 connection hitting its MAX_CONCURRENT_STREAMS limit. There should be a public API that can observe gRPC-internal queuing of outgoing messages. This could be useful when when debugging a DEADLINE_EXCEEDED RPC; it's very informative to know if the request never left the client's queues.
Basically, I'd like to revive https://github.com/grpc/grpc-java/issues/8945. To recapitulate the findings on that issue:
StreamTracer.outboundMessageSentis called after a message is sent to the transport not when the message hits the network.onReadycallbacks are strongly correlated with messages being released onto the network. However, unary calls do not sendonReadycallbacks.
One way to resolve this issue would be to change StreamTracer.outboundMessageSent to be called after the message is actually sent to the network. Alternatively a new StreamTracer callback—outboundMessageTransmitted?—could serve that purpose.
This could be useful when when debugging a DEADLINE_EXCEEDED RPC
We already package up that information into the DEADLINE_EXCEEDED status message. And we have infrastructure where we could add more, if something is missing.
What are you trying to use this information for? Are you wanting code that reacts dynamically based on the results or are you wanting to gather more details for a human to debug?
This could be useful when when debugging a DEADLINE_EXCEEDED RPC
We already package up that information into the DEADLINE_EXCEEDED status message.
Where in DEADLINE_EXCEEDED? I see the "delayed" transport can add information to a DEADLINE_EXCEEDED. But the flow control and MAX_CONCURRENT_STREAMS buffering I'm worried about happens within netty.
Improving the DEADLINE_EXCEEDED message is of course good. I'm also interested in obtaining this information for successful but "slow" calls and observing the buffering of outbound messages in streaming calls. I want to build a trace of all the queues a message goes through.
What are you trying to use this information for? Are you wanting code that reacts dynamically based on the results or are you wanting to gather more details for a human to debug?
At the moment, debugging and understanding performance.
(I suppose in the future, it's possible to explore automatically opening another channel if calls are blocked too long on MAX_CONCURRENT_STREAMS.)
Oh, I missed you wanted messages. There's some bits you can infer from traces, but in general talking about when a message is sent complicated.
MAX_CONCURRENT_STREAMS would fit into our current stream.appendTimeoutInsight() flow for DEADLINE_EXCEEDED, and Netty can supplement it. It might be a bit hard to plumb this into Netty, and even harder to support this in the new HTTP/2 Netty API. But if we figure out the Netty part, the rest of it is easy.
We have support for OpenCensus tracing today, and we're adding OpenTelemetry tracing. There's also Perfmark tracing, which shows more about thread interactions and wouldn't be as good for watching a single RPC you were suspicious about after-the-fact. The current stuff is lacking MAX_CONCURRENT_STREAMS visibility, and flow control is hard to express.
There has been talk (and a private design) of opening new connections when hitting MAX_CONCURRENT_STREAMS.
I realize the most detailed efforts would require netty plumbing or forking. But what about my suggestion in the OP to notify the stream tracer in NettyClientHandler.TransportState.onWriteFrameData? That at least would provided better information than today.
CC @YifeiZhuang
I think adding a StreamTracer. outboundMessageTransmitted after flow control and add to Otel Tracing seems a good idea and low hanging fruit. We probably need census from other languages. And when to support it is another question. @ejona86
We talked about this in cross language meeting and we had census that have a notification when the last bytes are written to the wire sounds fair. @benjaminp do you want to send a PR for this?
cc.@ejona