node-http-proxy icon indicating copy to clipboard operation
node-http-proxy copied to clipboard

performance

Open manast opened this issue 9 years ago • 49 comments

I would like to open a thread regarding performance, because I think it requires a bit more of attention than it has actually received lately.

Although this project does not aim to be as performant as other leading proxies such as HAProxy or nginx, I think most of its users would certainly be happy if the proxy does not degrade the performance to a nodejs server by a factor of 10x or 15x.

This closed issue: https://github.com/nodejitsu/node-http-proxy/issues/929 shows that it is very easy to actually verify the performance degradation produced by the proxy. The workaround is to send a http.Agent with keepAlive: true, and maybe other finetunings. The issue refers to a FAQ, but I cannot find any FAQ. Also it would be great to provide a couple of things regarding http.Agent: 1) what is the tradeoff of using it? (if none, why is it not enabled by default), 2) Which is the optimal set of options for it? just changing arbitrarily and re-testing does not seem like a good approach, and also a dangerous one I may add, since the user most probably does not know what he is doing. 3) Why is http.Agent required to begin with?

One thing that may make many wondering is how is it possible that if a dummy http server in node js is capable of delivering responses in the order of magnitud 10k, a simple proxy infront of it, that should, in its most basic conceptual form, just pipe the data from source to target, reduces the performance to order of magnitude 0.5k. One could accept 50% degration, but not this much. Thing is, this may not be related to http-proxy at all, for instance I wrote this super simple example:

var http = require('http');
var request = require('request');

http.createServer(function(req, res){
  request.get('http://127.0.0.1:3000').pipe(res);
}).listen(8080);

http.createServer(function(req, res){
  res.writeHead(200);
  res.write('HELLO WORLD');
  res.end();
}).listen(3000);

And got this results (in node 4.5):

$ wrk -c 64 -d 15s http://localhost:3000
Running 15s test @ http://localhost:3000
  2 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.85ms    3.41ms 307.55ms   99.38%
    Req/Sec     6.73k   505.00     7.10k    93.38%
  202128 requests in 15.10s, 24.87MB read
Requests/sec:  13384.99
Transfer/sec:      1.65MB
$ wrk -c 64 -d 15s http://localhost:8080
Running 15s test @ http://localhost:8080
  2 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    55.45ms   12.50ms 158.42ms   88.76%
    Req/Sec   569.33    128.19   760.00     81.94%
  8265 requests in 15.07s, 0.98MB read
Requests/sec:    548.31
Transfer/sec:     66.40KB

So can it really be node's streams are so amazingly slow? It's a bit worrisome I must admit. Anybody as any insights that he wouldn´t mind to share?

manast avatar Sep 02 '16 11:09 manast

I did two more experiments that I find somehow interesting.

Using minimum http-proxy(I already got the same figures using https://github.com/OptimalBits/redbird based on http-proxy):

var httpProxy = require('http-proxy');
httpProxy.createProxyServer({target:'http://localhost:3000'}).listen(8080);

Gives basically the same result as using request streaming:

$ wrk -c 64 -d 15s http://localhost:8080
Running 15s test @ http://localhost:8080
  2 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    52.03ms   10.90ms 119.16ms   87.05%
    Req/Sec   607.21    104.95   747.00     79.41%
  8257 requests in 15.04s, 0.98MB read
Requests/sec:    548.97
Transfer/sec:     66.48KB

On the other hand, using req-fast (https://github.com/Tjatse/req-fast) I get consistently this:

var reqFast = require('req-fast');
var http = require('http');

http.createServer(function(req, res){
  reqFast('http://127.0.0.1:3000').pipe(res);
}).listen(8080);
$ wrk -c 64 -d 15s http://localhost:8080
Running 15s test @ http://localhost:8080
  2 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    49.73ms    9.96ms 142.05ms   89.36%
    Req/Sec   648.36    136.26     0.97k    72.80%
  16320 requests in 15.03s, 2.01MB read
Requests/sec:   1086.16
Transfer/sec:    136.83KB

So about twice as fast as request and http-proxy. Meaning that it is possible to implement a streaming proxy that is at least that fast inside http-proxy.

manast avatar Sep 03 '16 07:09 manast

Using keepAlive: true on the http.Agent with http-proxy:

wrk -c 64 -d 15s http://localhost:8080
Running 15s test @ http://localhost:8080
  2 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    27.49ms    4.07ms 118.95ms   94.56%
    Req/Sec     1.17k   126.61     1.43k    88.67%
  35017 requests in 15.03s, 4.31MB read
Requests/sec:   2330.04
Transfer/sec:    293.53KB

Using keepAlive with req-fast:

var keepAliveAgent = new http.Agent({ keepAlive: true });
http.globalAgent = keepAliveAgent;
$wrk -c 64 -d 15s http://localhost:8080
Running 15s test @ http://localhost:8080
  2 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    48.62ms    7.44ms 102.17ms   85.51%
    Req/Sec   656.12     98.75     0.94k    79.84%
  16294 requests in 15.08s, 2.00MB read
Requests/sec:   1080.32
Transfer/sec:    136.09KB

So http-proxy gets a huge 4 times better performance with keepAlive, while req-fast stays the same.

manast avatar Sep 03 '16 08:09 manast

Previous experiments were aimed to just check what is the overhead for a minimal web server, now check this results when serving strings of different sizes from 32 bytes to 256Kb.

Test code:

var http = require('http');

var httpProxy = require('http-proxy');
var keepAliveAgent = new http.Agent({ keepAlive: true, maxSockets: 1000 });

var randomstring = require("randomstring");
var msg = randomstring.generate(2*1024);

httpProxy.createProxyServer({target:'http://localhost:3000', agent: keepAliveAgent}).listen(8080);

http.createServer(function(req, res){
  res.writeHead(200);
  res.write(msg);
  res.end();
}).listen(3000);

In this case I used needle for implementing a proxy since it gave best performance than req-fast and request:

var needle = require('needle');

http.createServer(function(req, res){
  needle.request('get', 'http://127.0.0.1:3000', null, {agent: keepAliveAgent, connection: 'keep-alive'}).pipe(res);
}).listen(8080);

And the results: image

For me, the interesting here are basically 3 things:

  • There is a lot of overhead to setup the streaming.
  • For strings of size 32 and above, the proxy overhead is negligible.
  • The streaming gives best performance with strings of size 32Kb, after that it starts degrading, which is strange, I was expecting less and less overhead so more and more raw throughput with large strings.

manast avatar Sep 04 '16 08:09 manast

I filled the gaps to get a more linear chart: image

manast avatar Sep 04 '16 09:09 manast

I would definitely like to see this addressed as well. I've been using Nginx as a dynamic reverse proxy for some time now. It's worked well, but the Nginx configuration spec is massive and confusing. And the dynamic part requires scripting in Lua, which isn't the easiest. I'd like to implement some more sophisticated features to my proxy, but that's painful in Nginx/Lua. It would be much simpler to do in Node.js, but I'm running into the same performance issues as manast with this module.

I did my own benchmarking. I spun up a Docker container first with a simple Nginx server serving static text (because I knew it would be lightning fast). I spun up a second Nginx container set up as a reverse proxy for the first to use as a comparison. I then spun up a Node.js 6.3 container testing a proxying with this module and the built in Node http client as well as serving static content for comparison. I benchmarked with wrk using 10 and 100 connections. Then I repeated the whole process with a Node.js source (instead of the Nginx one) introducing a 100 ms delay before serving content.

Here is my proxy code:

const http = require('http');
const httpProxy = require('http-proxy');
const keepAliveAgent = new http.Agent({ keepAlive: true });

// Plain HTTP server w/ static content.
http.createServer((req, res) => {
    res.statusCode = 200;
    res.setHeader('Content-Type', 'text/plain');
    res.end('Hello World\n');
}).listen(8000);

// http-proxy w/ default global agent
const defaultAgentProxy = httpProxy.createProxy();
http.createServer((req, res) => {
    defaultAgentProxy.web(req, res, {
        target: "http://10.224.51.210:8080"
    });
}).listen(9000);

// http-proxy w/ keep-alive agent
const keepAliveAgentProxy = httpProxy.createProxy({ agent: keepAliveAgent });
http.createServer((req, res) => {
    keepAliveAgentProxy.web(req, res, {
        target: "http://10.224.51.210:8080"
    });
}).listen(9001);

// http-proxy w/ no agent
const noAgentProxy = httpProxy.createProxy({ agent: false });
http.createServer((req, res) => {
    noAgentProxy.web(req, res, {
        target: "http://10.224.51.210:8080"
    });
}).listen(9002);

// Node http client w/ default global agent
http.createServer((req, res) => {
    http.get({
        hostname: '10.224.51.210',
        port: 8080,
        path: '/'
    }, upstreamRes => {
        upstreamRes.pipe(res);
    });
}).listen(10000);

// Node http client w/ keep-alive agent
http.createServer((req, res) => {
    http.get({
        hostname: '10.224.51.210',
        port: 8080,
        path: '/',
        agent: keepAliveAgent
    }, upstreamRes => {
        upstreamRes.pipe(res);
    });
}).listen(10001);

// Node http client w/ no agent
http.createServer((req, res) => {
    http.get({
        hostname: '10.224.51.210',
        port: 8080,
        path: '/',
        agent: false
    }, upstreamRes => {
        upstreamRes.pipe(res);
    });
}).listen(10002);

And here were the results: image

The Nginx and Node.js static results are just there as a theoretical floor. The proxy server can't possible be faster than just serving static content. Nginx was definitely faster, but not quite as fast as I thought. It does perform better with 100 connections, but that's probably just because Node is using a single thread and Nginx is using 2 (one per CPU on my server).

The Nginx proxy results are the real target. If we can get close to the performance of Nginx we're in great shape.

With the node-http-proxy module, we see a 2x latency and 1/2 the requests compared to Nginx. I didn't see any difference between the default global agent (no keep-alive and infinity sockets) and no agent at all. Unexpectedly, when I enabled keep-alive the latency quadruples for 10 connections. For the 100 connections it performed much better.

I also proxied using the built-in http client to make a request to the upstream source and piped the results into the response. I did this with and without keep-alive. The results with keep-alive were fantastic! Even with a single thread the response times were lightning fast--even faster than the Nginx proxy--in all scenarios.

So what am I missing? Why is there such a huge latency added by node-http-proxy?

dtjohnson avatar Sep 04 '16 18:09 dtjohnson

I just did a little more benchmarking. This time I put the source Node.JS container on one server, the proxy containers on a second server, and the wrk container on a third to get a more representative benchmarking and to avoid any competition of resources. I also added HAProxy to the mix and went up to 1000 connections. I also added a test with the Node.JS http client using cluster so there are 2 Node.JS workers (one per CPU). These are all with Node.JS 6.5. Here were the results:

image

Here "Direct" means direct access to the source server with no proxy. There's a lot of network fluctuation, but it looks like the direction connection, HAProxy, and the Node.JS HTTP client piping with cluster are all about equally performant. node-http-proxy, by comparison, performs horribly. :( I was also surprised to see how poorly Nginx performed.

Here's my nginx.conf

worker_processes auto;

events {
    worker_connections 1024;
}

http {
    server {
        listen 8080;
        location / {
            proxy_pass http://source.mydomain.com:8080;
            proxy_http_version 1.1;
            proxy_set_header Connection "";
        }
    }
}

Here's my haproxy.cfg

global
    maxconn 1024

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

frontend http-in
    bind *:8080
    default_backend default-server

backend default-server
    option http-keep-alive
    server s0 source.mydomain.com:8080

Here's my node-http-proxy JS code

const http = require('http');
const httpProxy = require('http-proxy');
const proxy = httpProxy.createProxy();
http.createServer((req, res) => {
    proxy.web(req, res, {
        target: "http://source.mydomain.com:8080"
    });
}).listen(8080);

Finally, here is my custom Node.JS HTTP client piping proxy with cluster (not sure if I got the error/abort handling done correctly though):

const cluster = require('cluster');
const http = require('http');

const numCPUs = require('os').cpus().length;
const keepAliveAgent = new http.Agent({ keepAlive: true });

if (cluster.isMaster) {
    for (var i = 0; i < numCPUs; i++) {
        cluster.fork();
    }
} else {
    http.createServer((req, res) => {
        const proxyReq = http.request({
            method: req.method,
            path: req.url,
            headers: req.headers,
            hostname: 'source.mydomain.com',
            port: 8080,
            agent: keepAliveAgent
        }, proxyRes => {
            res.writeHead(proxyRes.statusCode);
            proxyRes.pipe(res);
        });

        req.pipe(proxyReq);

        proxyReq.on("error", e => {
            res.write(e.message);
            res.end();
        });

        req.on("error", e => {
            proxyReq.abort();
            res.write(e.message);
            res.end();
        });

        req.on('aborted', function () {
            proxyReq.abort();
        });
    }).listen(8080);
}

dtjohnson avatar Sep 05 '16 12:09 dtjohnson

Quite interesting results. But how large is the data being proxied? as you can see in my results about it has a lot impact in the results. In any case, it will be interesting to know why http-proxy is performing so bad, I have to check the sources but I guess it is using http.request internally...

manast avatar Sep 05 '16 13:09 manast

Following your suggestion, I ran the benchmarks with a variety of message lengths (from 1B to 1MB) using the randomstring module like you did. I didn't see the same issue as you. I wonder if you are seeing some strange behavior by running your proxy server and your upstream source on the same single Node.js thread. The results were very interesting though. Here are the results with the same 3 server configuration but with an even faster upstream source.

Here is 10 connections: image And zoomed in on the <= 1kB messages: image

Now 100 connections: image And zoomed in on the <= 1kB messages: image

Now 1000 connections: image And zoomed in on the <= 1kB messages: image

HAProxy performs very well, adding only a small overhead in all scenarios. Node.JS client piping makes a decent showing. Nginx performs surprisingly poorly, and node-http-proxy lags far behind. node-http-proxy failed completely in most of the 1000 connection runs.

dtjohnson avatar Sep 06 '16 00:09 dtjohnson

I guess more people is needed to verify this results, but if so, this means that is not unrealistic to say that we could have a nodejs based proxy that is competitive enough to be used for high traffic sites. Its still quite amazing that nginx performs so bad in these tests, I wonder if there is not something in the configuration or the setup that makes it perform so bad (disclaimer: I am a novice in nginx).

manast avatar Sep 06 '16 07:09 manast

Btw, it would be also highly relevant to test with HTTPS. We should agree that any serious site should work using HTTPS anyway, so that is the performance we should care about :)

manast avatar Sep 06 '16 07:09 manast

I came to the same conclusion. Despite whatever is going on with node-http-proxy and Nginx, it certainly seems a Node.js-based proxy is a realistic option.

I'm stumped as to what is wrong with Nginx. I have to believe it is some configuration issue. I tried disabling proxy buffering and caches--no help. I tried using IP address instead of DNS for the target--no difference. I disabled keep-alive (which I need to support server-sent-events) and that did improve performance but still well below the the performance of Node.js.

HTTPS is a good idea to test too. I tend not to think about that as in my setup my proxies are fronted by an AWS Elastic Load Balancer that does SSL termination. Would be interesting to see any differences there though. Gzip on the proxy would interesting too.

One other point of consideration is that Nginx and HAProxy in my tests aren't set up with scripting. They are just fixed proxying. In the Node.js proxy we have access to the entire JS language and libraries. To do a fair test with Nginx and HAProxy we'd have to test dipping into a Lua script execution on each request to do a fair comparison.

I'd love to script the entire benchmarking workflow so we can iterate on this more quickly and consistently. So dynamically spin up 3 servers, start the proxy servers, benchmark them, and then terminate the servers.

dtjohnson avatar Sep 06 '16 11:09 dtjohnson

As I mentioned above I am the author of https://github.com/OptimalBits/redbird, and I am using http-proxy for the actual proxy work, but maybe it is not so difficult to implement a http.request based proxy instead. The risk is that there are probably a lot of cases, common as well as edge, regarding handling of headers and other stuffs that are not so easy to get right without a long period of battle testing such as http-proxy already has have.

manast avatar Sep 06 '16 12:09 manast

Agreed. It's the variety of cases I worry about too, but I'm afraid it's a bridge I may have to cross. I don't think I'll be able to evolve my current dynamic Nginx proxy easily enough (especially seeing the Nginx performance issues from the above benchmarking). I really think I need to swtich to a Node.js based one.

Would love to see the authors of node-http-proxy chime in. I feel like I must be doing something majorly incorrect that would explain the performance issues...

dtjohnson avatar Sep 06 '16 12:09 dtjohnson

@manast @dtjohnson Have you taken a look at #614? Providing node-http-proxy an https agent and setting maxScockets may help. There also appears to be potential slowdown from DNS when setting target other than ip address. May not make a difference in Node v4+

eezing avatar Sep 10 '16 19:09 eezing

@eezing, yeah, I tried a variety of options. See my first post on this thread. Keep-alive performance was definitely the worst option, but I need it for server-sent events (though I do have an alternative idea for that).

So I went ahead and automated the full benchmark process: https://github.com/dtjohnson/proxy-benchmark The code spins up 3 AWS EC2 servers on-demand (one for the upstream, one for the proxies, and one for wrk). It then runs through a suite of benchmarks. The results are viewable here: https://dtjohnson.github.io/proxy-benchmark/

Here's an image, which is fairly consistent with the one above: image

This tool should make it easy to iterate on the Node proxy and see relatively quickly the performance implications of various configurations.

A number of next steps to try:

  • Proxying headers (the current Node piping just sends the status code, no headers). I expect this will hurt performance.
  • SSL
  • Gzip
  • No keep-alive
  • Piping the sockets. There is an intriguing looking piece of code showing piping the underlying sockets here: https://nodejs.org/api/http.html#http_event_connect. I'm curious how that will perform.

dtjohnson avatar Sep 12 '16 12:09 dtjohnson

Hmmm... Still no response?

I played some more with the Node proxy. In my examples before I wasn't sending the proxied response headers back. When I added it with:

res.writeHead(proxyRes.statusCode, proxyRes.headers);

The performance dropped dramatically--from 5ms to 40ms latency. I figured out it was because the upstream server sent a Content-Length header that was sent in the response. That prevented Node from using Transfer-Encoding: chunked. Deleting the header (and the Connection header) before sending the response restored the performance (code here).

I looked at the proxy with node-http-proxy and it wasn't using Transfer-Encoding: chunked. I wonder if that might be responsible for the performance hit we're seeing. Unfortunately, I couldn't figure out from the docs how to enable chunked transfer or how to modify the response headers.

dtjohnson avatar Oct 18 '16 11:10 dtjohnson

@dtjohnson strange that having content-length prevented node from using chunked transfer, this needs to be verified somewhere, does not makes complete sense to me :/.

manast avatar Oct 18 '16 11:10 manast

ok, you are right: https://en.wikipedia.org/wiki/Chunked_transfer_encoding Buy still, the content should be streamed in chunks, there should not be any major performance differences.

manast avatar Oct 18 '16 11:10 manast

Hmm thats a good point, this requires a bit more investigation into how node core behaves. Its not a bad idea to have certain headers that get stripped but we may want to make that opt in or it would be a breaking change. Thoughts?

jcrugzz avatar Oct 18 '16 16:10 jcrugzz

In regards to the overall discussion here. Im +1 to have performance optimizations considered and implemented into this. My approach to start this was to extract some of the core logic out of http-proxy into http-proxy-stream.

In reality we shouldn't be able to beat the performance of nginx or haproxy but we should do as best we can while maintaining 100% correctness given the foundation of node that we build upon.

jcrugzz avatar Oct 18 '16 17:10 jcrugzz

Unless my nginx configuration is completely wrong (which it may be), my benchmarks show that a Node proxy could absolutely outperform nginx. I'm not sure what I'm missing but it seems node-http-proxy is way slower than it should be...

dtjohnson avatar Oct 19 '16 14:10 dtjohnson

@dtjohnson It does not seem that the http-proxy team has done any serious performance benchmarks, and that they have just assumed it is not possible to compete to other standardised solutions such as nginx or haproxy. I think benchmark should be a part of the development process of this module. It is paramount. Lets not give up in being faster than nginx until proved that it is not possible.

manast avatar Oct 19 '16 18:10 manast

Picking this up again after a while away. I did some more experimenting. I spent some more time being careful about upstream keep-alives in my proxy benchmarking suite. This time I ran the node-http-proxy with a keep-alive agent and the results were actually pretty good: image The results in light-green are comparable with the results of my simple Node.js proxy in dark green.

When running without keep-alives on any of the proxies, the results were not as good but still decent.

When enabling gzip the performance drops: image I'm guessing this is due to performance issues with the zlib module, but it's still not terrible.

The results are pretty encouraging. I wish the gzip performance was better, but I'm much more comfortable using node-http-proxy now.

dtjohnson avatar Dec 20 '16 01:12 dtjohnson

@dtjohnson thanks for the results. We could then conclude that node-http-proxy is as fast as what is currently possible with node. The dev team should use a test like this to always verify that the proxy has not been degraded in performance between releases, and that it always is kept at the same level as what plain nodejs can offer as maximum throughput.

manast avatar Dec 22 '16 10:12 manast

btw, what version of nodejs did you use? (this will be pretty relevant since improvements in the http module as well as on streams will have huge impact on the benchmarks)

manast avatar Dec 22 '16 10:12 manast

another remark, what about HTTPS support? As internet is moving, HTTPS support is almost the standard now, so any relevant benchmark should include it.

manast avatar Dec 22 '16 10:12 manast

Certainly seems to be about as fast as possible--with the keep-alive agent!

I used the latest node Docker image, which is v7.3. I suspect the Node version is going to be the biggest driver of performance too.

I didn't bother with HTTPS as I use an AWS ELB for SSL termination in my use case. I also didn't want to figure out how to configure SSL certs for Apache, Nginx, and HAProxy.

dtjohnson avatar Dec 31 '16 20:12 dtjohnson

Even if you use AWS for HTTPS, lets say that the performance drop is 10x, then the performance of nodejs is just 10% av the total, which makes it even more irrelevant compared to the other contenders... (I am somehow trying to reach to the conclusion that node proxy is just as good as any other proxy) :).

manast avatar Dec 31 '16 22:12 manast

Any comparison with Netlix Zuul proxy?

acanimal avatar Jan 01 '17 17:01 acanimal

Apologies for the late response. Things have been hectic.

@manast, fair point about SSL. I'll work on getting that benchmark in place. I just need to chase down all of the SSL configs for the various proxies.

@acanimal, nope, but I'm happy to add it if you want to give me a Docker container and config. Pull requests are welcome: https://github.com/dtjohnson/proxy-benchmark

dtjohnson avatar Jan 12 '17 12:01 dtjohnson