cowboy icon indicating copy to clipboard operation
cowboy copied to clipboard

Performance of POST body processing speed is 10x slower in Cowboy 2.10.0 compared to 1.1.2

Open EzoeRyou opened this issue 1 year ago • 3 comments

We're porting an Erlang software that depends on now deprecated Cowboy 1.1.2 to the recent Cowboy 2.10.0.

During the porting process, we found out that on processing POST body, Cowboy 2.10.0 performs 10x slower in terms of bandwidth relative to the Cowboy 1.1.2 without enabling JIT. It's still 8.4x slower even if we enabled JIT.

This regression of performance prevent us to update the Cowboy in our software.

Here is the minimal benchmark code to reproduce the issue, and the summary of benchmark result.

https://github.com/AoiMoe/cowboy_post_bench

EzoeRyou avatar Jun 28 '23 07:06 EzoeRyou

You may want to tweak the read_body options or the HTTP/1.1 option active_n, maybe others.

essen avatar Jun 28 '23 14:06 essen

Thanks for the suggestions.

We tweaked various options, changing active_n and length doesn't solve the performance regression.

We found out that changing buffer size of socket setopts was effective. But it's still 10-20% slower than cowboy1. buffer need to be set to really huge value to compensate the regression introduced in cowboy2

The detailed micro benchmark code and results are noted here, see Test 3.

https://github.com/AoiMoe/cowboy_post_bench

The summary of tweaking buffer size is, cowboy2 with default buffer size of 1460 is 10x slower than cowboy1. the performance improves as we increase the buffer size. We saw dramatic improvement(or I'd like to call it compensation) on performance until buffer size of 32768. After that, it appears like diminishing returns but we see some performance improvement until 262144. Buffer size of 524288 was worse than 262144. It will never reach to the same performance of cowboy1.

While the performance regression on cowboy2 was somewhat mitigate by increasing the buffer size, the micro benchmark was performed on a loopback device rather than going through the real Internet route so it's not the real world scenario, we still think 10-20% performance regression is too much to risk the upgrade. We also think default behaviour should be sane.

Is there any way we can do to completely fix the performance regression introduced in cowboy2?

EzoeRyou avatar Jul 13 '23 08:07 EzoeRyou

The changes that result in a performance drop are related to the support for HTTP/2 which performs better than HTTP/1.1 in real use cases. In the future Cowboy will also support HTTP/3 which performs even better (http3 branch is a work in progress).

There's likely room for improvement for HTTP/1.1 still, I'll take a look when time allows. But right now my priority is HTTP/3.

There's not much point measuring performance using loopback for what it's worth, although I'm sure the code performs worse in Cowboy 2 due to how it is structured. One thing you can do with Cowboy 2 however is write your own stream handler to handle these requests as stream handlers execute in the connection process and have the same performance properties as Cowboy 1 had.

essen avatar Jul 14 '23 07:07 essen