Bidi/server-streaming calls cause a leak if the response isn't read or disposed
(I haven't tested with a server-only streaming call, but I'd expect it to have the same behavior.)
This occurs in both Grpc.Core and Grpc.Net.Client as far as I can see. Tested with Grpc.Net.Client 2.51.0; I don't have the Grpc.Core version handy, but I can sort that out another time if it's wanted.
A repro is currently available in the "Issue10034" directory of my https://github.com/jskeet/google-cloud-dotnet/tree/issue10034-repro branch - but it's a large repo to pull for just this, so I'd recommend browsing.
The use of the BigQuery API is purely as a simple way of investigating our GAPIC library wrapper behavior as well - the sample doesn't talk to an actual BigQuery API. (You'll need credentials of some form to repro the GAPIC code; let me know if you want the code that just uses a bogus access token for that instead. This issue is focused on the "raw" gRPC layer though.)
Basically, code like this is broken:
var channel = GrpcChannel.ForAddress("https://localhost:5001");
var client = new BigQueryWrite.BigQueryWriteClient(channel);
var request = new AppendRowsRequest { WriteStream = "abc" };
var stream = client.AppendRows();
await stream.RequestStream.WriteAsync(request);
await stream.RequestStream.CompleteAsync();
// Assume desired behavior is fire-and-forget - we don't care about the responses
The code above, as written, will leak memory (permanently as far as I can see).
It can be fixed by either disposing of stream, or by reading all the responses from stream.ResponseStream. I'm perfectly fine with documenting that it's best practice to do both of those - but I don't think we should actually leak memory if the user fails to do so.
The current behavior is correct. CompleteAsync just completes the request stream. The response stream is still active until the server ends the response stream and the client reads to the end, or the client disposes the call.
There are docs around calling a bidirectional stream here: https://learn.microsoft.com/en-us/aspnet/core/grpc/client?view=aspnetcore-7.0#bi-directional-streaming-call
Do you think it's sufficient or would you like a section added?
The response stream is still active until the server ends the response stream and the client reads to the end, or the client disposes the call.
And that's fine in terms of CompleteAsync - I wasn't proposing that anything should happen then. But my normal experience as a .NET developer is that if I forget to clean something up and have no more references to it, eventually it will get released automatically. It's better to be explicit about closing connections etc, but I can't think of other situations where it causes a memory leak.
There are docs around calling a bidirectional stream here: https://learn.microsoft.com/en-us/aspnet/core/grpc/client?view=aspnetcore-7.0#bi-directional-streaming-call
Do you think it's sufficient or would you like a section added?
If we keep the current behavior, I think we need much more strongly worded documentation.
The current docs say: "For best performance, and to avoid unnecessary errors in the client and service, try to complete bi-directional streaming calls gracefully."
"For best performance" is a very long way from "In order to avoid permanently leaking memory from your process". In many cases I write code where I don't care about best performance - I'm happy enough with "reasonable" performance, and I'll make do with that if getting "best" performance would mean writing more complex code. But I'd definitely care about memory leaks in most cases. I think it needs a really prominent warning. It should also mention disposal, I'd say - because even if it's awkward to drain all the responses (especially when considering a potentially poorly-behaving server that never completes its side of things), disposing of the call will at least fix the memory leak (even though it's not as graceful).
But personally I'd prefer the use of weak references or similar in order to automatically clean up eventually if the caller neither disposes nor reads the responses.
The bidi stream docs isn't considering a memory leak from not disposing the call. The context behind encouraging people to gracefully complete a request is compared to the alternative of the client or server aborting one side. For example, the client disposing the gRPC call. Aborting can cause performance issues, and even cause the HTTP/2 connection to go into a bad state and be aborted when combined with heavy load.
I think adding information about disposing streams would be a good addition. I haven't heard people raise this as a problem before, but I can imagine newer devs making this mistake. If you don't know to dispose the disposable gRPC calls after you stop using it then you could get yourself into trouble.
I don't want to add using a weak reference or finalizer to automatically end streaming calls. Firstly, HttpClient + HttpRequestMessage doesn't do this. It's your responsibility to clean up the request there, and I'd like gRPC client to be consistent. The bigger reason is there is a huge side effect to ending a call on the client: the call is also ended on the server. Some apps probably start a fire-and-forget gRPC call on the client, and expect the server call to run to completion. Imagine a long running SQL query that has the request aborted token. Aborting the gRPC call on the client based on a timer or the GC executing the finalizer would abort the server and the SQL query.
Fire-and-forget calls are precisely the problem here - because it sounds like that's basically not supported without a leak. The client needs to "fire, wait, and tidy-up" otherwise they'll get a leak. I suspect there are several use cases we'd need to consider - including aborting the gRPC call on the client based on a timer because the call should be aborted after a certain period (if the server is basically stuck for some reason).
I suspect more clients would want "abort the call if I've lost all references to it" than "keep the call running just in case there's a side-effect I want", but I'm not going to argue for it. (I would point out that if the client process terminates - as is entirely reasonable in a fire-and-forget scenario then presumably the call would be aborted at that point too, so the issue is still there.)
Fundamentally I suspect that streaming calls are always going to be tricky to use robustly. I'll look at changes in our (Google Cloud .NET client libraries) documentation and abstraction to make it easier for customers to do the right thing. (We wrap the gRPC call, and at the moment that wrapper doesn't implement IDisposable, so that's probably the first thing to change...)
Fire-and-forget calls are precisely the problem here - because it sounds like that's basically not supported without a leak. The client needs to "fire, wait, and tidy-up" otherwise they'll get a leak.
If you want to have fire-and-forget, and the server could run for a very long time, and you want to guarantee there isn't a leak, the solution is to specify a deadline: https://learn.microsoft.com/en-us/aspnet/core/grpc/deadlines-cancellation?view=aspnetcore-7.0#deadlines
If you want to have fire-and-forget, and the server could run for a very long time, and you want to guarantee there isn't a leak, the solution is to specify a deadline
That's really helpful - thank you so much. Obviously it's only an extra line of defense, but it's a really useful one - I'll mention it explicitly when talking about streaming calls in our docs.
(I've just tried with my repro code, and adding a deadline to the call does indeed fix the leak - with a much higher peak memory usage than graceful consumption+disposal, but that's to be expected.)
FYI - Just noting that this is also the behaviour seen with grpc-core client - resource leaks if call not gracefully completed. When looking at this I did note that Grpc.Core.Api doesn't implement finializers that call Dispose() and wondered if that was a bug - but the above from James has convinced me that it is the correct behaviour.
So, I'm in this situation where I have to add this after bidi.RequestStream.CompleteAsync otherwise server throws a lot of client has reset connection errors because I dispose the bidi call when I leave this scope.
await foreach (var _ in bidi.ResponseStream.ReadAllAsync(cancellationToken: cancellationToken)) { }
return; // early out
There's no more responses coming from the server. Even if I log something in that loop. Nothing ever arrives. Without that loop I get the reset connection errors.
Is this really necessary? It seems odd to me.