docker
docker copied to clipboard
NGINX: Expects 8 html5-frontend instances which leads to long startup/connection time when starting a meeting
Problem
The service nginx
expects 8 instances of html5-frontend
for load-balancing in /etc/nginx/conf.d/default.conf
:
upstream poolhtml5servers {
zone poolhtml5servers 32k;
least_conn;
server 10.7.7.200:4100 fail_timeout=10s max_fails=4 backup;
server 10.7.7.201:4101 fail_timeout=120s max_fails=1;
server 10.7.7.202:4102 fail_timeout=120s max_fails=1;
server 10.7.7.203:4103 fail_timeout=120s max_fails=1;
server 10.7.7.204:4104 fail_timeout=120s max_fails=1;
server 10.7.7.205:4105 fail_timeout=120s max_fails=1;
server 10.7.7.206:4106 fail_timeout=120s max_fails=1;
server 10.7.7.207:4107 fail_timeout=120s max_fails=1;
}
However, the default env-file .env.sample
enables only one instance of html-frontend
(the one being declared as backup in the nginx config).
As all other seven instances are not started and therefor not available, nginx cycles through all of them. Once connecting to all upstream instances failed, it considers the (only working) backup instance.
This behavior leads to at least 10 seconds of additional loading/connection time.
nginx_1 | 2021/05/15 10:39:48 [error] 34#34: *2056589 connect() failed (113: Host is unreachable) while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.203:4103/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1 | 2021/05/15 10:39:48 [warn] 34#34: *2056589 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.203:4103/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1 | 2021/05/15 10:39:52 [error] 34#34: *2056589 connect() failed (113: Host is unreachable) while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.207:4107/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1 | 2021/05/15 10:39:52 [warn] 34#34: *2056589 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.207:4107/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1 | 2021/05/15 10:39:55 [error] 34#34: *2056589 connect() failed (113: Host is unreachable) while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.204:4104/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1 | 2021/05/15 10:39:55 [warn] 34#34: *2056589 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.204:4104/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
bbb-web_1 | 2021-05-15T10:39:55.698Z DEBUG o.b.web.controllers.ApiController - ApiController#index
kurento_1 | 14:50:00.247595291 1 0x7f0250001400 INFO KurentoWebSocketTransport WebSocketTransport.cpp:346:keepAliveSessions: Keep-Alive for session 'b3fc38ca-2843-4bff-a3aa-789686ec996c'
nginx_1 | 2021/05/15 10:39:58 [error] 34#34: *2056589 connect() failed (113: Host is unreachable) while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.201:4101/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1 | 2021/05/15 10:39:58 [warn] 34#34: *2056589 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.201:4101/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1 | 127.0.0.1 - - [15/May/2021:10:39:58 +0000] "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1" 200 5096 "https://xxx/b/cer-ihk-odp-sql" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
Workaround
Method 1
Set NUMBER_OF_FRONTEND_NODEJS_PROCESSES=1
in .env to 8
, rebuild and restart the whole stuff.
Method 2
Remove the seven frontend instances and keep only the current backup instance:
-
docker-compose exec nginx /bin/ash
-
vi /etc/nginx/conf.d/default.conf
- Edit the file as follows:
map $remote_addr $freeswitch_addr {
"~:" [::1];
default 10.7.7.1;
}
upstream poolhtml5servers {
zone poolhtml5servers 32k;
least_conn;
server 10.7.7.200:4100 fail_timeout=10s; # max_fails=4 backup;
# server 10.7.7.201:4101 fail_timeout=120s max_fails=1;
# server 10.7.7.202:4102 fail_timeout=120s max_fails=1;
# server 10.7.7.203:4103 fail_timeout=120s max_fails=1;
# server 10.7.7.204:4104 fail_timeout=120s max_fails=1;
# server 10.7.7.205:4105 fail_timeout=120s max_fails=1;
# server 10.7.7.206:4106 fail_timeout=120s max_fails=1;
# server 10.7.7.207:4107 fail_timeout=120s max_fails=1;
}
server {
listen 8080 default_server;
listen [::]:8080 default_server;
server_name _;
access_log /dev/stdout;
absolute_redirect off;
root /www/;
# opt-out of google's floc tracking
# https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-idea
add_header Permissions-Policy "interest-cohort=()";
# redirect to greenlight
location = / {
return 302 /b;
}
# Include specific rules for record and playback
include /etc/nginx/bbb/*.nginx;
}
-
exit
-
docker-compose restart nginx
Solution
The nginx config should only expect as many instances of html-frontend
as specified in .env.
Linking to the bigbluebutton/bigbluebutton issue https://github.com/bigbluebutton/bigbluebutton/issues/12291
We just templated that bug out with Ansible:
upstream poolhtml5servers {
zone poolhtml5servers 32k;
least_conn;
server 10.7.7.200:4100 fail_timeout=10s max_fails=4 backup;
{% for n in range(vars.meteor_backend_processes + vars.meteor_frontend_processes|int)%}
server {{ '10.7.7.201' | ipmath(n) }}:{{4101 + n}} fail_timeout=120s max_fails=1;
{% endfor %}
}
Hi @chfxr, you only need the number of meteor_frontend_processes here, don't you? At least, as I understood it after reading the bigbluebutton issue. I came up with the following template:
upstream poolhtml5servers {
zone poolhtml5servers 32k;
least_conn;
{% for i in range(bbb_html5_frontend_processes | default(2) | int(2)) %}
server 127.0.0.1:410{{ i }} fail_timeout=5s max_fails=3;
{% endfor %}
}