aspnetcore icon indicating copy to clipboard operation
aspnetcore copied to clipboard

Enable kernel response buffering for HTTP.sys server

Open wdls22 opened this issue 6 years ago • 22 comments

I deploy an ASP.NET Core Service in Service Fabric in Azure. I tried Kestrel and Http.sys as server and using Response compression. I found if I use Http.sys as server and usign Response compression, static file download is very slow It will cost more than 1 minutes to doanload a 1.6mb JS file. If I use Kestrel, this issue will not happen. If I use http.sys without Response compression, it also did not happen. So maybe something wrong with Http.sys and Response compression?

I tried it on ASP.Net 2.2 and 3.0, it can repro in both environments.

To Reproduce

Building an ASP.NET Core Service with Http.sys as server (using https), Using "app.UseResponseCompression();" to compress the static file response. Then deploy it to Azure Service Fabric with 5 nodes.

Using browsers to open website. It will take a long time to download static files. If using Kestrel or Http.sys without compression, it will not happen. And if I run this service on my local machine, it will never happen Whatever combination.

Screenshots

image

wdls22 avatar Sep 26 '19 06:09 wdls22

Does it make a difference if you use http vs https?

This is most likely an issue with buffering (or the lack thereof) between response compression and HttpSys. Kestrel has what we call a write-behind buffer for the response body, so most write operations complete immediately and then the server aggregates them on a background thread when writing to the network. HttpSys doesn't do this, every write waits for the data to get fully flushed to the network layer before unblocking the app for the next write. This is mainly a problem if you're doing multiple small writes, you would see lots of small TCP packets on the wire. StaticFiles should be doing fairly large writes though, unless response compression is breaking them up?

Using Wireshark to capture the TCP traffic for HttpSys and Kestrel would be a good place to start. Then we can see how efficiently they're flushing to the network.

Tratcher avatar Oct 01 '19 21:10 Tratcher

any update on this problem?

udlose avatar Jan 24 '20 19:01 udlose

We've had few reports and are not currently investigating this. If you can provide additional details that would help us prioritize it.

Using Wireshark to capture the TCP traffic for HttpSys and Kestrel would be a good place to start. Then we can see how efficiently they're flushing to the network.

Tratcher avatar Jan 24 '20 20:01 Tratcher

16KB js file download take 25 seconds to response from server!

dalir-bajelani avatar Sep 06 '20 10:09 dalir-bajelani

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

ghost avatar Sep 06 '20 10:09 ghost

I have seen a similar behavior a few days ago on Windows Server. Serving Static files with multiple users became very slow. It seems to me that this is due to threadpool starvation (on .net FW 4.8). The number of used threads in the threadpool increased approximately by 1 per second with the server being idle, and after some time there were enough threads that the server started working again.

When using .net core 3.1 I found that requesting 12 static files shows about 70 items completed in ThreadPool when using no compression, but > 3.500 items completed when using compression.

t-auer avatar Sep 24 '20 09:09 t-auer

Are any active threads blocked on sync IO? That's the usual cause of threadpool starvation.

Tratcher avatar Sep 24 '20 12:09 Tratcher

Threads blocked on I/O: not that I am aware of (I did not see anything in VS debugger) - I reduced my test program up to the point that it does nothing except starting http.Sys and providing static files (load is generated with VS load testing) - in addition it has a thread that writes available threads of the threadpool to the console every 5 seconds. The test pogram has NLog configured for Logging.

Maybe the following code segments are helpfull:

  private static IWebHost BuildWebHost(string serverUrl, string[] args) =>
            new WebHostBuilder()
            .ConfigureAppConfiguration((hostingContext, config) =>
            {
                config
                    .SetBasePath(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location))
                    .AddInMemoryCollection(new Dictionary<string, string>
                    {
                            { "WebConfig:JwtBearerIssuer", ConfigurationManager.AppSettings["jwtBearerIssuer"] },
                            { "WebConfig:JwtBearerIssuerKey", ConfigurationManager.AppSettings["jwtBearerIssuerKey"] }
                    })
                    .AddXmlFile("MyConfig.xml", false, false)
                    .AddCommandLine(args);
            })
         .ConfigureServices(services =>
         {
             services.AddHttpClient();
             services.AddResponseCompression();
             services.AddCors();
         })
         .ConfigureLogging((hostingContext, logging) =>
         {
             logging.ClearProviders();
         })
         .UseStartup<StartupFW>()
         .UseNLog()
         .UseContentRoot(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location))
         .UseWebRoot(Path.Combine(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location), "MyClientDir"))
         .UseUrls(serverUrl)
#if DEBUG
                .UseEnvironment("Development")
#endif
            .UseHttpSys(options =>
                {
                    options.Authentication.AllowAnonymous = true;
                    options.Authentication.Schemes =
                        Microsoft.AspNetCore.Server.HttpSys.AuthenticationSchemes.NTLM |
                        Microsoft.AspNetCore.Server.HttpSys.AuthenticationSchemes.Negotiate;
                    options.MaxRequestBodySize = null;
                })            
         .Build();

and:

       public void Configure(IApplicationBuilder app, IHostingEnvironment env)
        {
            if (env.IsDevelopment())
            {
                app.UseDeveloperExceptionPage();
            }

            app
                .UseCors()               
                .UseStatusCodePages()
                .UseFileServer()
                .UseResponseCompression() // COMMENTING THIS LINE ELIMINATES THE PERFORMANCE PROBLEM
                .UseAuthentication()
                .UseWebSockets(new WebSocketOptions
                {
                    KeepAliveInterval = TimeSpan.FromHours(24) 
                });

and:

            var config = new LoggingConfiguration();

            // targets
            // Console
            var consoleTarget = new ColoredConsoleTarget("console")
            {
                Layout = "${message}"
            };
            consoleTarget.RowHighlightingRules.Add(new ConsoleRowHighlightingRule
            {
                Condition = ConditionParser.ParseExpression("level == LogLevel.Error"),
                ForegroundColor = ConsoleOutputColor.Red
            });
            consoleTarget.RowHighlightingRules.Add(new ConsoleRowHighlightingRule
            {
                Condition = ConditionParser.ParseExpression("level == LogLevel.Warn"),
                ForegroundColor = ConsoleOutputColor.Yellow
            });
            config.AddTarget(consoleTarget);

            var generalFileTarget = new FileTarget("generalfile")
            {
                Layout = "${message}",
                FileName = "${var:logdir}TEST_LOG.txt",
                ArchiveDateFormat = "yyyyMMddHHmmss",
                ArchiveNumbering = ArchiveNumberingMode.DateAndSequence,
                CreateDirs = true,
                EnableArchiveFileCompression = true,
                KeepFileOpen = true,
                ConcurrentWrites = true,
                OpenFileCacheTimeout = 30
            };
            config.AddTarget(generalFileTarget);
            // rules            
            config.AddRuleForAllLevels(consoleTarget, "consolelogger", true);
            config.AddRuleForAllLevels(generalFileTarget, "generalfilelogger", true);
            config.AddRule(LogLevel.Info, LogLevel.Fatal, consoleTarget, "*");
            config.AddRule(LogLevel.Info, LogLevel.Fatal, generalFileTarget, "*");
            // variables
            config.Variables.Add("logdir", new SimpleLayout(""));
            LogManager.Configuration = config;

t-auer avatar Sep 24 '20 14:09 t-auer

The next thing I wanted to try was counting the number and size of write operations with and without compression. My theory is that compression does a lot of small writes. This could be measured either with a wireshark trace looking at the http chunk sizes, or by wrapping the response body stream to add per write logs.

Want to give that a try?

Tratcher avatar Sep 24 '20 14:09 Tratcher

It took me some time to get back to this topic, but in the meantime I have tested using a wrapper on the response body stream: Without UseResponseCompression I do not see any write on the body stream, but with UseResponseCompression I see 3 writes when requesting a static file.

t-auer avatar Oct 09 '20 06:10 t-auer

Thanks @t-auer.

Without UseResponseCompression I do not see any write on the body stream

Because the stream is bypassed for the SendFileAsync API. The data gets sent as a single write.

but with UseResponseCompression I see 3 writes when requesting a static file.

That's not as noisy as I was expecting. How big is each write? And how big is the total file before and after compression?

A few other things that might make this combination slower:

  • Chunked is used instead of Content-Length, and the chunk terminator isn't sent until the end of the app pipeline so anything that delays pipeline cleanup could affect it.
  • Compression similarly needs to flush some data as the middleware pipeline unwinds.

There's not much you can do about either of those, they're part of the basic design.

Tratcher avatar Oct 09 '20 14:10 Tratcher

Hello Tratcher,

thanks for your detailled explanation. but I still think that the design is not ok. I completely understand that compresssion takes some time and thus makes the server slower.

However, if there is load on the server, threadpool starvation occurs, and this should not happen and - in my opinion - this is a major design issue.

You should be able to reproduce this behaviour by creating a simple server and then putting load on it using VS Studio load testing.

t-auer avatar Oct 13 '20 06:10 t-auer

Threadpool starvation? Load should not cause starvation, only blocking operations should do that. Do you see any stacks that indicate blocking operations?

Tratcher avatar Oct 13 '20 14:10 Tratcher

FYI I do think there's room for improvement here since we don't see these perf issues with Kestrel (or at least not that anyone has complained about?).

Tratcher avatar Oct 13 '20 14:10 Tratcher

Thanks for contacting us. We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We will evaluate the request when we are planning the work for the next milestone. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

ghost avatar Nov 02 '20 22:11 ghost

I've managed to reproduce this problem. It affected our middleware in NancyFX as well, so it was familiar.

The problem occurs only if you use HttpSys (i.e Windows). We need HttpSys as we share ports with other applications.

I'm pretty sure fronting it with IIS or a reverse proxy makes this problem go away. I've seen this on both Azure and AWS. I'm not sure about other networks.

Reproduction

I've put up a reproduction app here.

The problem is only noticeable if there is high network latency. For my tests I went from a VM in Azure West Europe to my Home (Australia). Latency is about 250-300ms. The VM is brand new, 2 CPU 8GB memory. The application was deployed as a self contained netcoreapp3.1 application.

I set the test app up and ran it in Chrome using the debugging tools and a plugin that allowed me to modify the headers that get send (i.e. remove the Accept-Encoding header). To double check I used the following code and got the same results:

using var client = new HttpClient();
await Download();
client.DefaultRequestHeaders.AcceptEncoding.Add(new StringWithQualityHeaderValue("gzip"));
await Download();

async Task Download() {
	var sw = Stopwatch.StartNew();
	await client.GetStringAsync("http://20.56.121.94/vendor-1606368060156.min.js");
	sw.ElapsedMilliseconds.Dump();
}

Results

Compressed: ~8 seconds Uncompressed: ~0.5 seconds

Workaround

The test app allows the setting of an environment variable BUFFER_SIZE that will switch on an alternate implementation of the GzipCompressionProvider. This implementation wraps a BufferedStream around the output that the GZipStream writes to. A buffer size of 200,000 seems to work well.

This reduced the time to ~0.8 sec.

Note: The workaround is probably leaky and not very memory friendly.

Network

I also did some WireSharking and the number of packets sent by both is the same. The only difference I saw was that the window size changed from 2051 to 524288 and that the slow one had lots of PSH flags in the TCP stream. I've attached the captures for the compressed, uncompressed and workaround requests Wireshark.zip.

droyad avatar Dec 16 '20 10:12 droyad

@droyad thanks for the details. The PSH's are interesting, I can see a 0.3s pause after each one which would explain why the download takes so long. You've filtered out the packets going the other direction, does it wait for an ACK from the client before continuing after each pause?

I don't see any changes in the window size in any of these traces, where are those?

Here's what I see about the structure of the different responses:

  • Uncompress: It's a Content-Length response of 794414 bytes and appears to have been sent as one single application level write, probably from the SendFile API.
  • Compressed: It's chunked response with many small chunks ranging from 2 to 8192 bytes. The chunks coincide with the PSH's, and should correlate one-to-one with the application level writes through the compression stream.
  • Workaround: It's also a chunked response, but there's only one chunk of 200000 bytes. That matches the buffer you added, you got it to buffer all of the compression and then write the results in one go at the end. There is still one PSH here on the final write, and the same pause afterward, but just the one so it works pretty well overall.

Open questions:

  • Why does HttpSys send a PSH after each app write operation? (And wait for an ACK?)
  • Verify the size of buffers being used to write from the the file to the compression stream (64kb?).
  • It doesn't look like GZipStream allows you to control its internal buffer size to limit how often it flushes?
  • What does the compressed wireshark trace look like for Kestrel?

Possible mitigations:

  • Call the Http.Sys send-body APIs differently to avoid the PSH/ACK? HTTP_SEND_RESPONSE_FLAG_BUFFER_DATA?
  • Add write-behind buffering to HttpSys to aggregate writes (similar to Kestrel) and overlapped writes.
  • Add incremental buffering to response compression.

Tratcher avatar Dec 18 '20 23:12 Tratcher

Possibly related: https://github.com/aspnet/HttpSysServer/pull/418

davidni avatar Jan 15 '21 19:01 davidni

Hello Tratcher,

thanks for your detailled explanation. but I still think that the design is not ok. I completely understand that compresssion takes some time and thus makes the server slower.

However, if there is load on the server, threadpool starvation occurs, and this should not happen and - in my opinion - this is a major design issue.

You should be able to reproduce this behaviour by creating a simple server and then putting load on it using VS Studio load testing. @t-auer do you find solution for the problem. I have the same problem on Kestrel. Funny thing is that some how is connected with buffers because I added compression middle-ware. The result time of responses are identical! So this is no traffic issue only something with buffers in .Net Core MVC. Any suggestions?

hsd99 avatar Jan 25 '21 22:01 hsd99

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

ghost avatar Aug 11 '21 20:08 ghost

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

ghost avatar Jul 11 '22 22:07 ghost

Triage: we need a good test for this that shows the impact of the change.

adityamandaleeka avatar Aug 10 '22 22:08 adityamandaleeka

+@avparuch who might be interested in perf improvements in https.sys

abatishchev avatar Nov 02 '22 01:11 abatishchev

Crosspost from grpc/grpc-dotnet#1971: A small project demonstrating this issue can be found here: https://github.com/LaurensVergote/HttpSysLaggyPerformance

LaurensVergote avatar Dec 08 '22 08:12 LaurensVergote

This has now made it to 6.0 and 7.0 for June servicing (#48072 and #48073).

adityamandaleeka avatar May 08 '23 23:05 adityamandaleeka

This is merged for full release in net8

mgravell avatar Jun 16 '23 13:06 mgravell