AspNetCore.Docs icon indicating copy to clipboard operation
AspNetCore.Docs copied to clipboard

Streaming response from middleware does not abort when client disconnects

Open logiclrd opened this issue 5 years ago • 15 comments
trafficstars

Describe the bug

I have a third-party server that is extremely picky about its requests, and a certain social platform that shall remain nameless insists on injecting extra query string arguments into URLs shared on it. The third-party server should be ignoring these query string arguments, but it isn't, and I don't have control over it, so I created a simple reverse proxy in ASP.NET Core to forward requests on to the third-party server, but altered to remove the unwanted arguments.

This third-party server has an endpoint that produces a dynamically-generated MJPEG stream. As such, this endpoint has no specific Content-Length and streams data as it arrives. In order to accommodate this, my reverse proxy is implemented as middleware that ends with the following loop:

while (true)
{
	int bytesRead = await responseStream.ReadAsync(buffer, 0, buffer.Length);

	if (bytesRead < 0)
		throw new Exception("I/O error");
	if (bytesRead == 0)
		break;

	await context.Response.Body.WriteAsync(buffer, 0, bytesRead);

	_statusConsoleSender.NotifyRequestProgress(requestID, bytesRead);
}

The third-party server will continue sending data as long as the client continues to receive it. As such, my loop must behave the same way: the only way that it knows that it should terminate the loop and stop sending data is if the client disconnects, so that context.Response.Body.WriteAsync() is no longer possible.

What I am observing, though, is that the loop keeps running, pumping data from the third-party service and sending it ... nowhere? I am able to see this because of the "Status Console" tied in by the last line in the loop body. After closing a browser tab that is receiving this MJPEG stream, "Status Console" shows bytes continuing to be pumped indefinitely. I left it running yesterday and it had gotten to over 6 GB before I terminated the server. But, netstat -an showed no connection from the browser to the proxy any more, and the proxy's process memory footprint had not grown, so clearly the data is just going nowhere.

I have only had the opportunity to test this with Kestrel.

To Reproduce

I have created a new project that minimally exercises precisely the functionality where I encountered this problem, and it does indeed reproduce it:

https://github.com/logiclrd/ReproduceMiddlewareStreamBug

I noticed that when browsing to the stream with Chrome, Chrome takes some time (10-ish seconds?) before it decides to show the text/plain data as it arrives.

EDIT by @Rick-Anderson add <!-- comment out version info -->

logiclrd avatar May 22 '20 19:05 logiclrd

The servers suppress write failures by default in order to reduce exception and log spam triggered by client behavior the server does not control. The best way to be notified of the client disconnecting is to check HttpContext.RequestAborted, and/or pass that token into your write operations.

I suspect this is not well documented. Please confirm that addresses your issue, and if so I'll transfer this to a doc bug.

Tratcher avatar May 22 '20 21:05 Tratcher

You may be interested in this AspNetCore based reverse proxy library we're working on. https://github.com/microsoft/reverse-proxy

Tratcher avatar May 22 '20 21:05 Tratcher

Thanks! I updated my data pump accordingly:

while (true)
{
	int bytesRead = await responseStream.ReadAsync(buffer, 0, buffer.Length, context.RequestAborted);

	if (bytesRead < 0)
		throw new Exception("I/O error");
	if (bytesRead == 0)
		break;

	await context.Response.Body.WriteAsync(buffer, 0, bytesRead);

	if (context.RequestAborted.IsCancellationRequested)
		break;

	_statusConsoleSender.NotifyRequestProgress(requestID, bytesRead);
}

With these changes, the browser tab closing is immediately detected.

logiclrd avatar May 22 '20 22:05 logiclrd

@logiclrd please suggest where this should be documented. Can you provide a draft of the update?

Rick-Anderson avatar May 23 '20 01:05 Rick-Anderson

Hmm, that's a really good question. I'll have to go and take a look at how things are currently documented. Without having yet done that, off the top of my head, it feels like maybe a small, separate page dedicated to what you need to do when streaming responses might be in order.

logiclrd avatar May 23 '20 04:05 logiclrd

I have put some time into producing this documentation. It is correct to the best of my knowledge and the linked-to sample project should compile and run out-of-the-box.

https://github.com/logiclrd/ASPNetCoreDocumentation/blob/master/ASPNetCoreStreaming.md

logiclrd avatar May 23 '20 07:05 logiclrd

Nice work @logiclrd! 🎷

guardrex avatar May 23 '20 12:05 guardrex

Okay, I have just learned additional facts that affect the accuracy of the documentation:

  • IEnumerable results directly from controllers are processed in essentially this sort of loop by ObjectResult within the ASP.NET Core infrastructure. They are not buffered as I had previously thought.

  • IAsyncEnumerable results are buffered as a work-around for System.Text.Json not supporting IAsyncEnumerable. I actually think this is a poor solution and plan to propose an alternative work-around. I also plan to propose updating System.Text.Json to support IAsyncEnumerable in the first place.

  • The loop in the ASP.NET Core infrastructure has exactly the bug I ran into here. It does not pay any attention to HttpContext.RequestAborted, and passes the IEnumerable<T> directly into JsonSerializer.Serialize, and that continues pulling records and serializing them after the client disconnects. I plan to propose a change to ASP.NET Core to address this.

I also need to update my documentation. At this time, that update would state that you can return an IEnumerable<T> from a controller directly, but a) you have no control over the buffering, b) it will be read through to the end every time even if the client disconnects, and c) you'd better make sure it does end. I'm wondering whether I should hold off on this, though, and instead wait and see whether the above proposals gain any traction. If they do, then the update here changes simply to state that you can return an IEnumerable<T> (or IAsyncEnumerable<T>), but you have no control over the buffering.

logiclrd avatar May 23 '20 16:05 logiclrd

@logiclrd it would be a good idea to add something about pipelines, using the BodyWriter instead of the body to avoid extra copies.

davidfowl avatar May 24 '20 04:05 davidfowl

This is something I am unfamiliar with. I'll look into doing some reading. I also have half a mind to see if I can't suggest a fix for the IAsyncEnumerable issue upstream, so that ASP.NET Core doesn't need to worry about the type at all.

logiclrd avatar May 24 '20 16:05 logiclrd

I did in fact propose a fix upstream: https://github.com/dotnet/runtime/issues/38055

logiclrd avatar Jun 24 '20 05:06 logiclrd

@logiclrd can you add your sample code to https://github.com/dotnet/AspNetCore.Docs.Samples/tree/main/fundamentals/streaming Use .NET 7 Once that's done I'll help you get your content migrated from https://github.com/logiclrd/ASPNetCoreDocumentation/blob/master/ASPNetCoreStreaming.md to this repo.

Rick-Anderson avatar Oct 01 '22 02:10 Rick-Anderson

I wrote a reply to your comment that made no sense. I have subsequently deleted it. The reason for that was that on my phone, the notification of your comment was routed to the GitHub app, and the GitHub app showed me no context at all. The UI indicated that your comment was the only comment on the issue, and as some time has passed I didn't have the entire issue paged into memory. :-P

I have checked my article write-up to ensure that it is fully up-to-date, and have discovered that System.Text.Json does in fact now know how to handle IAsyncEnumerable directly. ASP.NET Core still has the type AsyncEnumerableReader, but I think it is only used for serializers with no async support. I need to verify this to ensure that the article is accurate, but it is possible that the note about IAsyncEnumerable is entirely unnecessary at this point.

logiclrd avatar Oct 01 '22 09:10 logiclrd

In fact, I think I need to restructure this article a bit, because streaming of structured data now simply works. You don't need to write your own loop, you can simply return an IEnumerable or IAsyncEnumerable and the stack does the right thing. Nice :-)

The article needs to make that a separate top-level point of discussion, as distinct from streaming raw data.

logiclrd avatar Oct 01 '22 09:10 logiclrd

I have reworked the article so that it corresponds, to the best of my understanding (and based on a number of practical tests), with advancements in the ASP.NET implementation. I have not yet investigated the referenced repository of samples, but it will need updates to correspond with the article.

logiclrd avatar Oct 01 '22 11:10 logiclrd