unsock icon indicating copy to clipboard operation
unsock copied to clipboard

Unsock'ed NGINX reverse proxy server issue

Open Bert-Proesmans opened this issue 1 year ago • 6 comments

Hi, inspiring (great) tool!

I've been trying the library on nginx to redirect into VSOCK (starting from both AF_INET and AF_UNIX). It seems to work, but not 100% without issue.

Setup;

  • Configure VSOCK redirect configfile at expected place for AF_INET redirect
  • Run nginx with http reverse proxy for the configured IP+PORT or point it directly to the config file using unix:/ prefix
  • Unsock nginx with ld preload

To clarify, I validated that my proxy and upstream VM's properly communicate with each other; socat VSOCK-LISTEN and VSOCK-CONNECT allows me to bidirectionally communicate. My nginx configuration also validates without any issues (nginx -t).

Observed behaviour; Nginx actually properly proxies towards the upstream. The upstream receives the request and responds with data. Then nginx never forwards that data to the client. Nginx terminates the client connection due to proxy timeout after set amount of seconds (60s in my case). The weird thing is that nginx sometimes logs a special client disconnect message (can't exactly reproduce it), this message includes how many bytes were received/sent to client and upstream, and it certainly shows a pretty large amount of bytes received by the upstream (corresponding to the actual response data of the upstream).

I tested; 127.0.0.1:443 -> NGINX Server -> 127.175.0.0:8000 - [UNSOCK] -> VSOCK:10:8000 I tested; 127.0.0.1:443 -> NGINX Server -> unix:/run/nginx-vsock/upstream.vsock - [UNSOCK] -> VSOCK:10:8000 I tested; 127.0.0.1:443 -> NGINX Stream -> unix:/run/nginx/frontend.sock -> NGINX Server -> unix:/run/nginx-vsock/upstream.vsock - [UNSOCK] -> VSOCK:10:8000 Every scenario has the same symptoms.

Excerpt of logs when symptoms occur

Server 1 - Proxy

[bert-proesmans@1-test:~]$ journalctl -efu nginx
Sep 25 21:51:16 1-test systemd[1]: Starting Nginx Web Server...
Sep 25 21:51:16 1-test nginx-pre-start[556]: nginx: the configuration file /nix/store/3sfxs2lyqyvavv7xgg2f107glqm8d9xz-nginx.conf syntax is ok
Sep 25 21:51:16 1-test nginx-pre-start[556]: nginx: configuration file /nix/store/3sfxs2lyqyvavv7xgg2f107glqm8d9xz-nginx.conf test is successful
[...]
Sep 25 21:51:16 1-test generate-vsock-config[571]: UNSOCK_FILE: /run/nginx-unsock/photos-upstream.vsock
Sep 25 21:51:16 1-test generate-vsock-config[571]: UNSOCK_VSOCK_PORT: 10000
Sep 25 21:51:16 1-test generate-vsock-config[571]: UNSOCK_VSOCK_CID: 90000
Sep 25 21:51:16 1-test generate-vsock-config[571]: UNSOCK_VSOCK_CONNECT_SIBLING: 1
Sep 25 21:51:16 1-test systemd[1]: Started Nginx Web Server.
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: using the "epoll" event method
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: nginx/1.26.2
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: built by gcc 13.3.0 (GCC)
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: OS: Linux 6.6.52
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: getrlimit(RLIMIT_NOFILE): 1024:524288
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: start worker processes
Sep 25 21:51:17 1-test nginx[574]: 2024/09/25 21:51:17 [notice] 574#574: start worker process 578
Sep 25 21:51:17 1-test systemd[1]: Reloading Nginx Web Server...
Sep 25 21:51:17 1-test nginx[585]: nginx: the configuration file /nix/store/3sfxs2lyqyvavv7xgg2f107glqm8d9xz-nginx.conf syntax is ok
Sep 25 21:51:17 1-test nginx[585]: nginx: configuration file /nix/store/3sfxs2lyqyvavv7xgg2f107glqm8d9xz-nginx.conf test is successful
[...]
Sep 25 21:54:32 1-test nginx[592]: 2024/09/25 21:54:32 [error] 592#592: *1 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 127.0.0.1, server: photos.alpha.proesmans.eu, request: "GET / HTTP/2.0", upstream: "http://unix:/run/nginx-unsock/photos-upstream.vsock/", host: "photos.alpha.proesmans.eu"

EDIT; I patched unsock with this patch to add the VMADDR_FLAG_TO_HOST flag to the VSOCK socket struct flags field, this instructs the driver to forward vsock data to the host (cid 2) even though the destination cid is another value.

https://github.com/Bert-Proesmans/nix/blob/cb77bf615af27c8413c419e57b46bbb802851834/packages/unsock/001-flag-to-host.patch

Server 1 - Local curl test

[bert-proesmans@1-test:~]$ curl -vv --resolve *:80:127.0.0.1 --resolve *:443:127.0.0.1 --insecure https://photos.alpha.proesmans.eu
* Added *:80:127.0.0.1 to DNS cache
* RESOLVE *:80 using wildcard
* Added *:443:127.0.0.1 to DNS cache
* RESOLVE *:443 using wildcard
* Hostname photos.alpha.proesmans.eu was found in DNS cache
*   Trying 127.0.0.1:443...
* Connected to photos.alpha.proesmans.eu (127.0.0.1) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / X25519 / id-ecPublicKey
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=photos.alpha.proesmans.eu
*  start date: Sep 25 21:51:16 2024 GMT
*  expire date: Oct 25 21:51:16 2026 GMT
*  issuer: CN=minica root ca 3b3aca
*  SSL certificate verify result: self-signed certificate in certificate chain (19), continuing anyway.
*   Certificate level 0: Public key type EC/secp384r1 (384/192 Bits/secBits), signed using ecdsa-with-SHA384
*   Certificate level 1: Public key type EC/secp384r1 (384/192 Bits/secBits), signed using ecdsa-with-SHA384
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://photos.alpha.proesmans.eu/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: photos.alpha.proesmans.eu]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.9.1]
* [HTTP/2] [1] [accept: */*]
> GET / HTTP/2
> Host: photos.alpha.proesmans.eu
> User-Agent: curl/8.9.1
> Accept: */*
> 
* Request completely sent off
< HTTP/2 504 
< server: nginx
< date: Wed, 25 Sep 2024 21:54:32 GMT
< content-type: text/html
< content-length: 160
< 
<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Connection #0 to host photos.alpha.proesmans.eu left intact

The above log records a 504 Gateway Time-out on the client-side. The returned value should be an empty http directory listing.

Server 2 - Upstream

[bert-proesmans@2-test:~]$ python3 -m http.server -b 127.0.0.1 8080 &
socat -d -d VSOCK-LISTEN:10000,fork TCP4-CONNECT:127.0.0.1:8080
[1] 624
2024/09/25 21:52:20 socat[625] N VSOCK CID=90000
2024/09/25 21:52:20 socat[625] N listening on AF=40 cid:4294967295 port:10000
Serving HTTP on 127.0.0.1 port 8080 (http://127.0.0.1:8080/) ...
2024/09/25 21:53:32 socat[625] N accepting connection from AF=40 cid:10000 port:2672637753 on AF=40 cid:90000 port:10000
2024/09/25 21:53:32 socat[625] N forked off child process 626
2024/09/25 21:53:32 socat[625] N listening on AF=40 cid:4294967295 port:10000
2024/09/25 21:53:32 socat[626] N opening connection to AF=2 127.0.0.1:8080
2024/09/25 21:53:32 socat[626] N successfully connected from local address AF=2 127.0.0.1:53622
2024/09/25 21:53:32 socat[626] N starting data transfer loop with FDs [6,6] and [5,5]
127.0.0.1 - - [25/Sep/2024 21:53:32] "GET / HTTP/1.1" 200 -
2024/09/25 21:53:32 socat[626] N write(5, 0x55d650742000, 275) completed
2024/09/25 21:53:32 socat[626] N write(6, 0x55d650742000, 342) completed
2024/09/25 21:53:32 socat[626] N socket 2 (fd 5) is at EOF
2024/09/25 21:53:33 socat[626] N exiting with status 0
2024/09/25 21:53:33 socat[625] N childdied(): handling signal 17

EDIT; You can see here at the end; the HTTP upstream server replied instantly to the proxied request.

Nginx config
pid /run/nginx/nginx.pid;
error_log stderr debug;
daemon off;
events {
}
http {
	# Load mime types.
	include /nix/store/3zrkasqf3sqr9ff6sv5fddhlbf072a36-mailcap-2.1.54/etc/nginx/mime.types;
	# When recommendedOptimisation is disabled nginx fails to start because the mailmap mime.types database
	# contains 1026 entries and the default is only 1024. Setting to a higher number to remove the need to
	# overwrite it because nginx does not allow duplicated settings.
	types_hash_max_size 4096;
	include /nix/store/lf89iyal2p7jj26kngfmmias80aia2sc-nginx-1.26.2/conf/fastcgi.conf;
	include /nix/store/lf89iyal2p7jj26kngfmmias80aia2sc-nginx-1.26.2/conf/uwsgi_params;
	default_type application/octet-stream;
	upstream photos-upstream {
		server unix:/run/nginx-unsock/photos-upstream.vsock ;
	}
	# optimisation
	sendfile on;
	tcp_nopush on;
	tcp_nodelay on;
	keepalive_timeout 65;
	ssl_protocols TLSv1.2 TLSv1.3;
	ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-CHACHA20-POLY1305;
	# Keep in sync with https://ssl-config.mozilla.org/#server=nginx&config=intermediate
	ssl_session_timeout 1d;
	ssl_session_cache shared:SSL:10m;
	# Breaks forward secrecy: https://github.com/mozilla/server-side-tls/issues/135
	ssl_session_tickets off;
	# We don't enable insecure ciphers by default, so this allows
	# clients to pick the most performant, per https://github.com/mozilla/server-side-tls/issues/260
	ssl_prefer_server_ciphers off;
	# OCSP stapling
	ssl_stapling on;
	ssl_stapling_verify on;
	brotli on;
	brotli_static on;
	brotli_comp_level 5;
	brotli_window 512k;
	brotli_min_length 256;
	brotli_types application/atom+xml application/geo+json application/javascript application/json application/ld+json application/manifest+json application/rdf+xml application/vnd.ms-fontobject application/wasm application/x-rss+xml application/x-web-app-manifest+json application/xhtml+xml application/xliff+xml application/xml font/collection font/otf font/ttf image/bmp image/svg+xml image/vnd.microsoft.icon text/cache-manifest text/calendar text/css text/csv text/javascript text/markdown text/plain text/vcard text/vnd.rim.location.xloc text/vtt text/x-component text/xml;
	gzip on;
	gzip_static on;
	gzip_vary on;
	gzip_comp_level 5;
	gzip_min_length 256;
	gzip_proxied expired no-cache no-store private auth;
	gzip_types application/atom+xml application/geo+json application/javascript application/json application/ld+json application/manifest+json application/rdf+xml application/vnd.ms-fontobject application/wasm application/x-rss+xml application/x-web-app-manifest+json application/xhtml+xml application/xliff+xml application/xml font/collection font/otf font/ttf image/bmp image/svg+xml image/vnd.microsoft.icon text/cache-manifest text/calendar text/css text/csv text/javascript text/markdown text/plain text/vcard text/vnd.rim.location.xloc text/vtt text/x-component text/xml;
	proxy_redirect          off;
	proxy_connect_timeout   60s;
	proxy_send_timeout      60s;
	proxy_read_timeout      60s;
	proxy_http_version      1.1;
	# don't let clients close the keep-alive connection to upstream. See the nginx blog for details:
	# https://www.nginx.com/blog/avoiding-top-10-nginx-configuration-mistakes/#no-keepalives
	proxy_set_header        "Connection" "";
	include /nix/store/rmcl3ckxhx2q8sg32274y7rdf7y6ir61-nginx-recommended-proxy-headers.conf;
	# $connection_upgrade is used for websocket proxying
	map $http_upgrade $connection_upgrade {
		default upgrade;
		''      close;
	}
	client_max_body_size 10m;
	server_tokens off;
	[...]
	server {
		listen 0.0.0.0:443 ssl ;
		server_name photos.alpha.proesmans.eu ;
		http2 on;
		[...]
		location / {
			proxy_pass http://photos-upstream;
			include /nix/store/rmcl3ckxhx2q8sg32274y7rdf7y6ir61-nginx-recommended-proxy-headers.conf;
		}
	}
}

Expected behaviour; Nginx proxies upstream data correctly to the client. Upstream connections are transparently proxied through the VSOCK driver.

It seems like some signaling is not happening correctly. I've scoured the internet looking for similar symptoms but after a few days i'm tired of seeing causes about too low file-size limits or too low timeout values... I'll have to pull out strace to debug this issue, but at this point I'm in a bit too deep over my head. Hoping you have some ideas into which direction I should investigate.

As a null hypothesis kinda thing; not unsocking nginx and using straight unix sockets works completely as expected. the cause of the symptoms is the unsock library somehow.

Bert-Proesmans avatar Sep 25 '24 22:09 Bert-Proesmans