build
build copied to clipboard
Investigate increasing open file limit
A follow-on from https://github.com/nodejs/nodejs.org/issues/5184 and https://github.com/nodejs/nodejs.org/issues/5149, we should look at increasing the number of files that NGINX can have open at any one time.
This could either be done by increasing the system ulimit, or by setting worker_rlimit_nofile in NGINX.
or by setting
worker_rlimit_nofilein NGINX.
Are we allowed to do that without increasing the system limit?
We're starting to see a lot more errors that look like this today, which I think is related.
HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR
I found this ticket from https://github.com/nodejs/nodejs.org/issues/5184
Are we allowed to do that without increasing the system limit?
I've been testing this locally with NGINX in Docker Compose...
Soft limit set low, no NGINX limit: errors ❌
docker-compose.yml:
ulimits:
nofile:
soft: 50
hard: 2048
nginx.conf:
# worker_rlimit_nofile 65535;
Getting back errors:
ab -n 100 -c 50 -v 2 localhost/en/blog.html | grep "Non-2xx"
Non-2xx responses: 2
Soft limit set low, NGINX limit set high: errors ❌
docker-compose.yml:
ulimits:
nofile:
soft: 50
hard: 2048
nginx.conf:
worker_rlimit_nofile 65535;
Getting back errors:
ab -n 100 -c 50 -v 2 localhost/en/blog.html | grep "Non-2xx"
Non-2xx responses: 2
Soft limit set high, no NGINX limit set: works ✅
docker-compose.yml:
ulimits:
nofile:
soft: 200
hard: 2048
nginx.conf:
# worker_rlimit_nofile 65535;
No errors:
ab -n 100 -c 50 -v 2 localhost/en/blog.html | grep "Non-2xx"
[no output]
Soft limit set high, NGINX limit set high: works ✅
docker-compose.yml:
ulimits:
nofile:
soft: 200
hard: 2048
nginx.conf:
worker_rlimit_nofile 65535;
No errors:
ab -n 100 -c 50 -v 2 localhost/en/blog.html | grep "Non-2xx"
[no output]
Soft limit set high, NGINX limit set low: errors ❌
docker-compose.yml:
ulimits:
nofile:
soft: 200
hard: 2048
nginx.conf:
worker_rlimit_nofile 50;
Getting back errors:
ab -n 100 -c 50 -v 2 localhost/en/blog.html | grep "Non-2xx"
Non-2xx responses: 2
So, it would seem that we would need to bump the system soft limit for it to have an impact on NGINX?
I've raised the file limit in nginx.conf. I've attempted to Ansible the task in https://github.com/nodejs/build/pull/3406 (although we don't run the Ansible scripts for the web server against the live server).
Since raising the open file limit we've not had any load balancing alerts from Cloudflare -- the last event was from before the limit was raised:
We have however, since had several reports of slow downloads leading to timeouts
- https://github.com/nodejs/build/issues/3408
- https://github.com/nodejs/nodejs.org/issues/5472
- https://github.com/nodejs/nodejs.org/issues/5471
so we may have just swapped one issue for another.
@richardlau if it's helpful input, my colleagues and I are experiencing the issue described in https://github.com/nodejs/nodejs.org/issues/5472. We just started noticing it this week. It seemed better for us even just last week.
I've reverted the file limit raising from https://github.com/nodejs/build/issues/3259#issuecomment-1618883633.
~~With that file limit increase in place, where any stats recorded from the server that'd indicate what the new bottleneck was? It seems, given this fixed Cloudflare failing over, that the bottleneck before was definitely the file limit, but what is it now? Perhaps CPU?~~
Edit: From discussion in the OpenJSF Slack, the bottleneck appeared to be network throughput saturation with the file limit increased.
It seems like this fixed things 🤔 Thank you!
We are experiencing issues that I'm guessing are related to this: curl: (18) HTTP/2 stream 1 was reset when attempting to download Node.
Are we still going to investigate this given that we intend to move to Cloudflare R2?
While I doubt this will ever get investigated, it's probably worth keeping open until NGINX is no longer in the path of any request? At the point NGINX is not serving anything, we can probably close this and a few other issues.