FrameworkBenchmarks icon indicating copy to clipboard operation
FrameworkBenchmarks copied to clipboard

why the plaintext test make the content-length of response is '15' in the example website ?

Open AkazawaYun opened this issue 2 months ago • 37 comments

HTTP/1.1 200 OK Content-Length: 15 Content-Type: text/plain; charset=UTF-8 Server: Example Date: Wed, 17 Apr 2013 12:00:00 GMT

Hello, World!

"Hello, World!" is actually 13 length. Some framword can not go pass the plaintext test, is it the reason ?

https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview

AkazawaYun avatar Oct 19 '25 11:10 AkazawaYun

Content-Length: 13 works fine. Maybe the example is assuming a \r\n termination

MDA2AV avatar Oct 19 '25 15:10 MDA2AV

ok, I see that asp.net core uses 'transfer-encoding: chunked' instead of 'content-length: 13' in his test, Both the code are written, but the first one was seleted to work as the test api...I feel confused about it. And, my framwork "akazawayun.pro" always get the result of 1090 rps at every test per week, I try to test it via JMeter on my PC, it work fine...its result is nearly to the rps of asp.netcore, i am very very confused about it..

AkazawaYun avatar Oct 23 '25 20:10 AkazawaYun

I tried running your benchmark locally but not working, is the request for plain text a GET to http://localhost:2022/post/plaintext?

Edit: Also just checked asp net platform test for plaintext and it returns a Content-Length : 13

MDA2AV avatar Oct 23 '25 21:10 MDA2AV

I change it to GET http://localhost:8080/plaintext 2 hour ago, and pull request just now, but the build/verify failed..., so the code that you tried to run may be not the newest version... I will download the old code from TechEmpower to my computer, and run my framework to confirm , wait a minute please, 3q

AkazawaYun avatar Oct 23 '25 21:10 AkazawaYun

The old version akazawayun.pro also work fine, i run it with Visual Studio 2019 on windows 10, and use an ApiTestor Tool to request http://localhost:2022/post/plaintext , it got the response successfully.

https://akazawayun.cn/screenshot.jpg https://akazawayun.cn/screenshot2.jpg

AkazawaYun avatar Oct 23 '25 21:10 AkazawaYun

Oh, I don't have Windows or Visual Studio, I am running it on a linux machine using curl and wrk (same tool techempower uses for load generator), can you try run

for GET case curl http://localhost:2022/post/plaintext

for POST case curl -X POST http://localhost:2022/post/json

or just use postman

in your command line on windows with the server running? It should work if the webserver is correctly configured. Not sure what ApiTestor Tool is. Ill try run from your PR

MDA2AV avatar Oct 23 '25 22:10 MDA2AV

Ok it is working with your changes

i built using your docker file and got for

Json test: 300k RPS plaintext: 200k RPS

ran both with wrk with 512 concurrency

for comparison with asp net platform i get

Json test: 2.6 million RPS plaintext: 26 million RPS

probably your framework does not support http pipelining which is used in the plaintext tests

For reference my machine is a linux-x64 with i9 14900k with 64GB 6400Mhz ram

MDA2AV avatar Oct 23 '25 22:10 MDA2AV

oh yes, it does not support http pipelining, the next request only can be accepted after the current request's response is sent to client . but the json test is also so slow...does the json test use the http-piplining ?

AkazawaYun avatar Oct 23 '25 22:10 AkazawaYun

oh yes, it does not support http pipelining, the next request only can be accepted after the current request's response is sent to client . but the json test is also so slow...does the json test use the http-piplining ?

No, the Json test does not use pipelining, one request at a time

/Desktop/Repos/wrk/scripts$ wrk -t32 -c512 -d15s http://localhost:8080/json
Running 15s test @ http://localhost:8080/json
  32 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.92ms    1.98ms  68.32ms   99.37%
    Req/Sec     8.80k   757.72    15.64k    97.18%
  4227161 requests in 15.10s, 697.42MB read
Requests/sec: 279957.34
Transfer/sec:     46.19MB

for asp net platform (aot)

 wrk -t32 -c512 -d15s http://localhost:8080/json
Running 15s test @ http://localhost:8080/json
  32 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.13ms    4.99ms 252.27ms   96.85%
    Req/Sec    79.69k    26.72k  239.64k    77.87%
  38061403 requests in 15.10s, 5.18GB read
Requests/sec: 2521190.73
Transfer/sec:    351.04MB

MDA2AV avatar Oct 23 '25 22:10 MDA2AV

The difference is so huge !! ..cry .. can you help to copy the launch-log of my framwork to me . There is an index named concurrent of the ISemaphore component, is it value be 28 ?

AkazawaYun avatar Oct 23 '25 22:10 AkazawaYun

Docker logs for /json

akaza_json_log.txt

Edit: 300k isn't so bad.. better than java spring :D

MDA2AV avatar Oct 23 '25 22:10 MDA2AV

thank you very much for the help and comfort. but still so depressed, I write this framwork is because I feel asp.net is complex and difficult to config and code when I first study how to develop a web server app, so I want to develop an easy and happy framwork to make partners develop webserver happily and quickly .

I use JMeter to test it for a long time, the result in JMeter shows that 'Throughput per second' is nearly with asp.net , even perform better at 10000 "Number of Thread (users)" and not select "Same user on each iteraion" setting in the Thread Group panel. ( asp.net appear failed connections in the two case )

wrk shows the different result with Jmeter, I need to find out the reason, and upgrade my framwork, fighting

AkazawaYun avatar Oct 23 '25 23:10 AkazawaYun

I am happy to help but dont feel depressed or discouraged, this is just one benchmark on a simple test, it may not be the best measure for the framework you built! 300k is a good result, also I compared with asp net platform benchmark which isn't a real framework, if we compare with asp net core mvc (controllers) the values goes down to 1.3 million plus these /json or /plaintext tests are far from a real world scenario load, you framework may perform better in those cases.

Nonetheless a tip I give you, if you want to participate in these kind of benchmarks you should test under similar circumstances, since you have a Windows machine try using docker but I am not sure how limited docker is on Windows, the best is to use on linux and don't use docker desktop.

Another thing is that i am using only 512 concurrency and these tests only create 512 connections and bombard them with keep-alive requests, maybe in Jmeter you are constantly creating new connections which is closer to a real world scenario. It could be that your webserver is not optimized for high load keep-alive connections

Edit: Plus, the results i showed you are just on my machine, wait for techempower data to make a real comparison since its hardware is more stable, or even better run them yourself in your machine

MDA2AV avatar Oct 23 '25 23:10 MDA2AV

I run asp.net minimalAPI and my framwork both on a Linux cloud computer, it has only 2 physical core and 2Gb memory, but the result is this... https://akazawayun.cn/wrk-linux.png

By the way , request-response on keep-alive tcp connections is faster about 10 times than create new tcp connection. For example, the rps on keep-alive is about 40k, and it decrease to 4k when change to create new tcp for each request in JMeter test my framwork.

AkazawaYun avatar Oct 24 '25 02:10 AkazawaYun

I run asp.net minimalAPI and my framwork both on a Linux cloud computer, it has only 2 physical core and 2Gb memory, but the result is this... https://akazawayun.cn/wrk-linux.png

By the way , request-response on keep-alive tcp connections is faster about 10 times than create new tcp connection. For example, the rps on keep-alive is about 40k, and it decrease to 4k when change to create new tcp for each request in JMeter test my framwork.

Ahh I see the issue!

When I run with 2 threads your framework has the same performance in my 16 core 32 thread machine.. you have closer performance to asp net core when number of threads is low, this shows that your framework does not scale with cpu cores

Desktop/Repos/wrk/scripts$ wrk -t2 -c512 http://localhost:8080/json
Running 10s test @ http://localhost:8080/json
  2 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.95ms  429.39us   9.56ms   90.56%
    Req/Sec   131.94k     8.87k  142.58k    84.00%
  2624208 requests in 10.04s, 432.96MB read
Requests/sec: 261449.18
Transfer/sec:     43.14MB
/Desktop/Repos/wrk/scripts$ wrk -t32 -c512 http://localhost:8080/json
Running 10s test @ http://localhost:8080/json
  32 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.54ms    5.09ms 116.56ms   99.02%
    Req/Sec     5.09k     0.96k   15.02k    81.02%
  1630763 requests in 10.10s, 269.05MB read
Requests/sec: 161470.64
Transfer/sec:     26.64MB

And sometimes with high thread count it even gets worse performance. You never noticed because you run it in a 2 core system

For comparison asp net platform with 2 thread

~/Desktop/Repos/wrk/scripts$ wrk -t2 -c512 -d5s http://localhost:8080/json
Running 5s test @ http://localhost:8080/json
  2 threads and 512 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   434.37us    1.06ms  52.57ms   99.67%
    Req/Sec   338.50k    18.44k  361.74k    62.00%
  3368028 requests in 5.05s, 468.95MB read
Requests/sec: 666877.74
Transfer/sec:     92.85MB

with minimal apis this value would probably be around 450-500k, still a lot closer to your.

Also if your machine is 2 core, running wrk with -t32 is not going to give you relevant data as you can see in your image, the performance is even lower due to overhead "shifting" threads since you can't run them in parallel.

As for the reason why you outscore minimal apis still in your system, my guess is you probably optimized your framework while testing in your system, this happened to me too, when i started developing my framework I was doing on a Windows machine and testing with bombardier which yields different results than running wrk in linux.

MDA2AV avatar Oct 24 '25 09:10 MDA2AV

In fact I don't know the -t parameter of wrk how effect the server performance.

When I set -t2 -c512, means tell wrk( the http clients manager ) to use 2 threads to manage 512 connections ( 512 http clients' sending and recving ). It works worse than -t32 -c512 on an 32 cores machine becasue 30 cores are idle; And it may work better than -t32 -c512 on an 2 cores machine because 32threads>2cores causing the thread-racing.

But all I said above is for the client-side, whatever -t2 or -t32 is the parameter of wrk, but not of the server. The server has its own logic to choose thread-count for accepting client and listening the data from netstream. So what I feel confused is why -t is effecting the performance of server. (and why wrk allows user to set the thread-count, it can set proper value autoly according to the machine's hardware. The user only need to set the client-count for connecting concurrently, and how many / how long time to send request for each connection. )

By the way, I updated the nuget package version at 25.1027 to fixed some bug, and PR seems to be passed just now, can you please test my framwork again when you are free ? ^ ^

AkazawaYun avatar Oct 25 '25 18:10 AkazawaYun

I did test with you latest branch, it is working fine, more or less same performance.

About wrk, the number of concurrent connections is the same yes, only depends on -c, but the higher the -t the "harder" those connections are driven, meaning more requests can be executed per connection and will plateau typically somewhere close to the number of cores/threads but not necessarily. Objectively analyzing your case where the performance plateaus at -t2 instead of -t16 like asp net for example, means your framework doesn't scale well with wrk -t.. And what does it mean exactly? Well, to help you understand I made a table with multiple -t and -c values.

i9 14900k 128 GB RAM @ 6000MHz Using your docker image running on docker engine in a linux-x64 Load generation: wrk -tx -cy http://localhost:8080/json

Image

As we can see, currently it doesn't scale with connections, so that could be it

MDA2AV avatar Oct 25 '25 19:10 MDA2AV

The words " meaning more requests can be executed per connection " wake up me ! I seems to understand it, As the -t grows from 1 to cores' count, the speed of sending and recving of each connection also be increasing, we consider the ability(spend time) of server to process a request-response is fixed, assume it can process 100req per second:

  1. when -t is small, one thread need to manager many client, causing one client(per connection) can't send 100 requests per second , it may be only send 50 req/s, it smaller than 100 (the ability of server) , so the Bottleneck appear on the client-side, so when we increase the -t , the result will perform better!
  2. when -t is equal to core-count, one client(per connection) now has enough power to send 100 even 200 req/s, if the result is bad, this means the Bottleneck appear on the server-side --- because the sever can only process 100 req/s, it can not process all the 200 requests from the client in one second, causing the result stop increasing with the -t .
  3. when -t is bigger than core-count, one client may lose the ability to send 100 requests per second because of the thread-racing, so the bad result may be caused from client or server, and we may will not find out which.

My understand is right?

AkazawaYun avatar Oct 25 '25 19:10 AkazawaYun

The words " meaning more requests can be executed per connection " wake up me ! I seems to understand it, As the -t grows from 1 to cores' count, the speed of sending and recving of each connection also be increasing, we consider the ability(spend time) of server to process a request-response is fixed, assume it can process 100req per second:

1. when -t is small, one thread need to manager many client, causing one client(per connection) can't send 100 requests per second , it may be only send 50 req/s, it smaller than 100 (the ability of server) , so the Bottleneck appear on the client-side,  so when we increase the -t ,  the result will perform better!

2. when -t is equal to core-count, one client(per connection) now has enough power to send 100 even 200 req/s, if the result is bad, this means the Bottleneck appear on the server-side --- because the sever can only process 100 req/s, it can not process all the 200 requests from the client in one second, causing the result stop increasing with the -t .

3. when -t is bigger than core-count, one client may lose the ability to send 100 requests per second because of the thread-racing, so the bad result may be caused from client or server, and we may will not find out which.

My understand is right?

Yes, and you need to figure out in your framework why it's bottleneck at low -c, maybe not taking advantage of the threadpool properly with async/await

MDA2AV avatar Oct 25 '25 19:10 MDA2AV

I think it's neccesary for me to setup a linux environment on my i5-8300H computer (4 core 8 logic thread), the 2 core cloud machine hides the potential problems. One reason I guess is that, in the last year, I find JMeter met Connect-Refused error when a large(>5000) connections connecting to the server in a small time . So I limit the number of request to be processed at one time to solve this problem, However many connnections is keeping, I'll just process a limited number of them at the same time. ...MASAKA it affects the result ?!

AkazawaYun avatar Oct 25 '25 20:10 AkazawaYun

I think it's neccesary for me to setup a linux environment on my i5-8300H computer (4 core 8 logic thread), the 2 core cloud machine hides the potential problems. One reason I guess is that, in the last year, I find JMeter met Connect-Refused error when a large(>5000) connections connecting to the server in a small time . So I limit the number of request to be processed at one time to solve this problem, However many connnections is keeping, I'll just process a limited number of them at the same time. ...MASAKA it affects the result ?!

Could be:)

Nothing like running it yourself on a local benchmark to fix/improve performance!

MDA2AV avatar Oct 25 '25 20:10 MDA2AV

Hello friend, long time no see, in these days I update my version, and pass the plaintext pipelining test successfully, but the rank is still low...orz

I alse installed an Ubuntu system with WSL on my computer ( 4core 8logic-thread ), run the server in Windows and run the wrk in the Unbuntu , the best result is only 30,000 , as the same with Asp.net core, yes , the result of asp is alse 30,000.

wrk-asp

wrk-aky

I also wrote a benchmark tool with c# to instead JMeter, ( I find JMeter has its plateau , which is under the c# ) , the result is very high:

  • in pipelining-mode /plaintext test : ak.pro 3,000,000 and asp.net 6,000,000 ;
  • in request-response-mode /json test : ak.pro 60,000 and asp.net 60,000 ;

I also test them on my friend's computer, it is powerfully 20core 28 logic-thread, my framwork performs as 20,000,000rps and Asp is 30,000,000rps with my wrotten benchmark tool , but it's Windows and I an not allowed to install the Ubuntu.

wrk-asp

wrk-aky

Things become obscurely , it can not be find out via the contrast experiment.

Could you help me to test the new version ak.pro on your machine with wrk? both the /plaintext and /json test.

By the way, this is my wrotten small benchmark tool, I don't know if it has the samiliar logic to wrk:

AkazawaYun.Benchmark

AkazawaYun avatar Nov 06 '25 20:11 AkazawaYun

Hey hope you're doing fine,

Given that on the benchmark platform wrk is the load tool used let's simplify the result analysis by focusing on results using wrk only, otherwise we can't reach any conclusions.

I tried to run your newly merged version to techempower but I am having the following log

akazalog.txt

Do you have any idea why it is failing to launch?

MDA2AV avatar Nov 07 '25 11:11 MDA2AV

The log seems no problem, according to the last lines, it means self request test is ok:

[inf][2025-11-07 11:27:12.883] [API SELF-TEST]
[inf][2025-11-07 11:27:12.883]  REQ URL :http://localhost:8080/plaintext
[inf][2025-11-07 11:27:12.924]  RES LEN :13
[inf][2025-11-07 11:27:12.924]  RES BODY:Hello, World!
[inf][2025-11-07 11:27:12.924] [OK, I WORK FINE]

By the way , I installed .net runtime into my wsl, and run both wrk and server in the WSL Ubuntu ( 8 logic-core )

  • server is run in plateform mode.
  • wrk use the pipelining-mode via a lua script like this, which is the same logic with TechEmpower's pipeline.lua :
-- wrk -c500 -t8 -d30 -s /home/test/pp.lua http://localhost:8080/plaintext

local pipeline_depth = 1000  -- batch request to simulate pipelining request
local pipeline_request

init = function(args)
    -- pre-build multi-req into one req to simulate pipelining request
    local requests = {}
    for i = 1, pipeline_depth do
        requests[i] = wrk.format(wrk.method, wrk.path, wrk.headers, wrk.body)
    end
    pipeline_request = table.concat(requests)
end

request = function()
    return pipeline_request
end

response = function(status, headers, body)
end

done = function(summary, latency, requests)
    io.write("Pipeline simulation completed\n")
    io.write(string.format("Pipeline depth: %d\n", pipeline_depth))
end

and the result is 2,100,000 rps:

wsl-aky

and the result of asp.net platform is 3,500,000 rps:

wsl-aky

AkazawaYun avatar Nov 07 '25 23:11 AkazawaYun

I installed .net sdk in linux , and build Asp.MinimalAPI and AkazawaYun.WebAPI in linux, the pipelining mode result is follow :

As we see,

  • AkazawaYun.PRO is higher than Asp.net when the connection is far more than cpu-logic-cores . ( The 3rd one nuget-build is low because the nupkg is after obfuscation which decrease the performance , but it still 50% rps of asp, not as the result show by tfb ) ;
  • When connections count is less or equals than cpu-logic-cores count:
    • AkazawaYun.PRO shows that the rps is increaseing by the connections count in a proportional relationship. ( about 22k per logic-core , but 22k is still strange , it should be 1million(-c500's rps) divided by 8(core) is 120k per core, maybe because WSL is Virtual Machine not Real OS ) ;
    • Asp.MinimalAPI don't satisfy the proportional relationship, its c8 is 2 times bigger than c1 rather than 8 times, It's strange, maybe it does something special to improve the rps when connection count is small .
    • The gap between AkazawaYun.PRO build from nuget and source disappeared , maybe when the connection is low, and thread racing is not serious , so the complication of obfuscated code execution is not a serious problem for cpu.

c500

c18

AkazawaYun avatar Nov 08 '25 03:11 AkazawaYun

Ran current tbf in my machine

Image

Use pipeline depth 16

MDA2AV avatar Nov 08 '25 15:11 MDA2AV

Thanks for your help, but What confusing me is that why the result run on your Linux i9-14900k ( 32 logic-core ) is 383k , which is Lower than my Linux 2 logic-core 's result 540k.

Even if assume that Akazawa is not scalable by the core count, the result of only use 2 cores , is still very Nonlogical betwwen our real Linux OS ( not WSL on Windows )

linux2core

AkazawaYun avatar Nov 08 '25 23:11 AkazawaYun

I find the different , the pipelining-depth that I configured is 1000, which is bigger than yours, maybe you can also set a bigger value, 16 is too small.

The small pipeline-depth leads the result of pipelining-test, to be no different with non-pipelining-test.

If set pipelining-depth to 1, its equals to non-pipelining-test.

AkazawaYun avatar Nov 08 '25 23:11 AkazawaYun

I find the different , the pipelining-depth that I configured is 1000, which is bigger than yours, maybe you can also set a bigger value, 16 is too small.

The small pipeline-depth leads the result of pipelining-test, to be no different with non-pipelining-test.

If set pipelining-depth to 1, its equals to non-pipelining-test.

Techempower uses 16 pipeline depth, that's why I use it too, for consistence between tests

With 1000 pipeline depth, results are difficult to evaluate as there are so many requests being sent by few connections.. If you want to understand your framework performance don't even use pipelined http, it's a meaningless metric as nobody uses it in the real world. Also the /json test is not pipelined

MDA2AV avatar Nov 08 '25 23:11 MDA2AV

Understanded..let me focus on the non-pipline-test....however..

Considering from theory, One connection will not send next request until recv previous requrest's response. But the every IO writing is so low, which leads performing poorly, it's a general problem, no matter which framwork, even it has ET technology, it must to wait for next request after waiting IO writing and flushing :

  • The rps can only increase with the connections increasing, and STOPs increasing ( pletau ) when the -c achieves the core count.
  • So a framework's non-pipeline-test's best rps is only according to the core count of the server machine.

If according to that, this should be the asp.MinimalAPI 's result ( it's also this in fact ):

https://akazawayun.cn/asp8c.png

https://akazawayun.cn/asp2c.png

asp8c

asp2c

AkazawaYun avatar Nov 09 '25 00:11 AkazawaYun