grpc-go
grpc-go copied to clipboard
Document that `ClientStream.SendMsg()` returns `nil` on success for unary/server-streaming RPCs
Use case(s) - what problem will this feature solve?
When a unary RPC is called, this code gets executed:
https://github.com/grpc/grpc-go/blob/59954c80165855ecd478a79e75bcc7d64f629ded/call.go#L65-L74
SendMsg() docs say this:
https://github.com/grpc/grpc-go/blob/59954c80165855ecd478a79e75bcc7d64f629ded/stream.go#L112-L115
I have a test for something unrelated where the server immediately returns an error. Very rarely, depending on the timing my test gets EOF instead of the error that the server returns. This is because server is occasionally faster than the client. It occurred to me that this is an issue for all unary RPCs - there is no way to get the actual error as the stream is not exposed to the user.
Proposed Solution
Check for io.EOF from SendMsg() and call RecvMsg() to get the actual error, return it.
Alternatives Considered
Leave it as is. This might be a requirement to be consistent with other implementations (I have not checked). Perhaps gRPC spec needs to be adjusted/clarified to change the behavior in this case.
Additional Context
gRPC v1.63.2. Go 1.22.3
Yes, I think this is just an oversight in our code here. If SendMsg returns io.EOF, we should call RecvMsg and use the error there. I believe it's possible for the RPC to end successfully even if the message can't be sent, so if RecvMsg returns nil, then the RPC should end with a success. There should definitely be a test for this. The server handler needs to be a streaming RPC handler that immediately ends the RPC (both success & failure should be tested). I'm not sure if we can inject something into the connection to make sends from the client to the server slower in order to stimulate the race more reliably, but that's one possible option.
To be clear, I don't have bandwidth to work on the fix myself.
No problem! This is rare but probably important enough that we should prioritize this internally. Also the fix is probably quite easy.
@dfawley Hi, is it possible for me pick this one up? Looks like we need to get the following done (please correct me if I'm wrong):
-
create a test that can reliably reproduce the scenario where
SendMsgreturnsio.EOF(server closes connection before client actually sends?) and the client doesn't get the expected actual error -
implement the proposed solution and make sure it make the test pass
@Kailun2047 Sure, that sounds good to me, as long as you think you can work on it pretty soon.
For the repro, I wonder if we can reproduce it by making a large request, and configure the server with a very small initial window size. That way the client won't be able to send the full request as long as the server handler treats the method as streaming. That should make it 100% reproducible, and be fairly simple to do.
@dfawley Got it. Will try to delve in and make an update here within a day or two.
@dfawley Hi, I tried creating a test per earlier discussion to reproduce but had no luck :sweat_smile: On StubServer, SendMsg always succeeds even when the request payload size is increased to the max allowable (4194304).
But upon some initial inspection, looks like the EOF reported might not originate from SendMsg. Concretely, for invocation of an unary RPC,
-
client stream is created with
unaryStreamDesc, which hasClientStreamsset tofalse -
when sending message with the client stream, the stream will not return
EOFfor non-client-streaming RPCs
Instead, it appears to me that the source of the EOF might be a different issue in RecvMsg. Concretely,
-
recvMsgmight returnEOFto indicate end of stream -
the upper level
RecvMsggets theEOFreturned and callsfinishwith it. Infinishwe convertEOFtonil, but the conversion isn't likely to be reflected inRecvbecause the original error is passed by value instead of by reference. Therefore it's possible forRecvMsgto returnEOF.
I'm definitely willing to put more investigation into this, but let me know if you are concerned about the timing and need to prioritize internally :smile:
Oh interesting. I thought it was something simpler, but you're right - some of the streaming code actually checks the type of the RPC and adjusts its behavior accordingly. If you're able to get to the bottom of this, we would definitely appreciate it. No worries on the timing at all - I don't think it's as urgent if it's a pretty rare occurrence.
@ash2k Hi - since we now might have a bigger space to search for the potential source of the EOF, is it possible to share more info about the test mentioned in OP? Specifically, are there any ServerOption/DialOption/CallOption being used there? Any other info will also be appreciated. Thanks!
The existing gRPC tests around Invoke (in which server also return error immediately) don't seem to help reproduce so I figure there could be something missing.
Without going into details, my code emulates a unary RPC over a streaming RPC. That's probably why you cannot reproduce the issue.
when sending message with the client stream, the stream will not return EOF for non-client-streaming RPCs
This means code in invoke() relies on an undocumented behavior that EOF is not returned from SendMsg() if it's a unary RPC. I see two options:
- This behavior should be documented as part of the
SendMsg()doc. - Special behavior that you linked to can be removed and code in
invoke()should handle the general documented behavior ofSendMsg(). It's a matter of moving theif.
@dfawley Whoops, turned out initially I had some misunderstanding on the issue :sweat_smile: But I think I get it now (thanks to @ash2k 's response):
- emulating the unary RPC client's
SendMsg&RecvMsgsequence on a streaming RPC client doesn't quite work (repro), because an additional check forEOFfromSendMsgis needed - the
SendMsg&RecvMsgsequence does work for unary RPC (i.e.invoke) though, because of the behaviorSendMsghas on non-streaming RPC type - we might want to adjust documentation/implementation of
SendMsgper suggested by @ash2k to avoid potential confusion (in our case it makes user suspect thatinvokedoesn't handle return value ofSendMsgproperly)
Sounds like documentation update is an easier option, but let me know if there's more to considerate :smiley:
I'm leaning towards getting rid of the special handling of non-streaming RPCs in sendMsg and handling the io.EOF in invoke. This would unify the behaviour for both kinds of RPCs and make the function easier to reason about.
The special handling in sendMsg has been there for for quite some time and the comment was changed 6 years ago in https://github.com/grpc/grpc-go/pull/1854
So I'm concerned about accidentally changing any user facing behaviour. Let me discuss with the team and share an update.
Assigning to @dfawley to comment on the preferred fix. Since he's presently on vacation, expect a reply by end of the week.
Returning io.EOF here would break existing code quite badly - our generated code relies upon SendMsg not returning an error on success. So we should update the docs to say something like: for ClientStreams=false streams, nil is returned unconditionally.