vert.x icon indicating copy to clipboard operation
vert.x copied to clipboard

Websockets serving cached/wrong responses and potentially have incorrect headers in websocket

Open agapic opened this issue 3 years ago • 26 comments
trafficstars

Version

Vertx core 3.6.3 AND 3.7.0 NGINX 1.19.9

Context

Vertx 3.6.3

We use NGINX 1.19.9 to proxy requests to our services in Kubernetes. One such service is a client manager that creates web sockets. For paths where the user in unauthorized, we call client.reject(403).

Let's say client 1 connects to the service using wss://myuri.com/api and gets a valid 403. Great. Now when client 2 connects to the service, the request handler appears to be re-using the request headers from client 1. The same is true for client 3, 4, 5, etc. Now all of our clients are in a bad state.

We've determined that it seems to be an interplay between nginx and vertx/netty. When we change upstream-keepalive-timeout to 0, the issue goes away. Only client 1 gets the 403. When we get rid of nginx entirely, the issue is also gone.

https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#upstream-keepalive-timeout

Do you have a reproducer?

Vertx 3.7.0

Reproducer code: https://github.com/NguyenVoDev/pure-vertx-js/blob/main/src/main/java/org/example/Server.java

Note this won't work if you try and repro locally, only if using nginx with keepalive set to > 0 (we have 60s). It could actually just be the headers nginx is adding to the request though

Instead of clients receiving headers from the existing web socket connection (like in 3.6.3), now the connections just hang and the websocket handler isn't even called.

Extra

Are we supposed to be closing websockets after rejecting them?

  • Java 11, Adopt openJDK

agapic avatar Mar 05 '22 05:03 agapic

@vietj any ideas what might cause this? This is causing a super critical bug for our product (911 software), and upgrading Vertx with our current configuration introduces other issues related to connectivity (but that’s an aside)

agapic avatar Mar 07 '22 19:03 agapic

Added reproducer code for Vertx 3.7.0

agapic avatar Mar 08 '22 03:03 agapic

can you upgrade to 3.9 ? that's the only version of Vert.x 3 we are currently supporting

vietj avatar Mar 08 '22 13:03 vietj

@vietj I just tried with Vertx 4.2.3 - same issue as 3.7. We don't appear the get the issue with NodeJS websocket server "ws" library. One theory around that is the nodejs code explicitly sends the Connection: close header which might force nginx or the client to close the connection in its upstream cache.

agapic avatar Mar 08 '22 16:03 agapic

I think fundamentally the root cause here is that Vertx doesn't expect different clients to re-use connections. The interplay between Nginx caching connections and Vertx causes issues. When a socket is rejected and a new client connects, the response from the previous client is left hanging because it never ended. Vertx tries to re-use the existing connection and incoming requests "hang" because the previous request is not finished (but really.. it is.. it's been rejected)

For 3.6.3, I think it's because Vertx reuses the websocket in the Http1xServerConnection class (line 302), and it reuses the headers too unfortunately, even if the incoming request has different headers.

However, in Vertx 3.9.12 AND 4.2.3l, instead of reusing the websocket, the connection hangs. :(

I think it hangs because in the Http1xServerConnection class in 3.9.12, it thinks there is a response in progress and pauses the request probably until the old conneciton is killed. My theory is that Vertx doesn't handle connection reuse correctly, which is what Nginx does with upstream caching. Our proposed solution is to close websockets after calling reject.

      synchronized (this) {
        requestInProgress = req;
        if (responseInProgress != null) {
          // Deferred until the current response completion
          req.pause();
          responseInProgress.enqueue(req);
          return;
        }
        responseInProgress = requestInProgress;
        req.handleBegin();
      }

agapic avatar Mar 08 '22 17:03 agapic

I'll have a look at the reproducer, thanks

vietj avatar Mar 09 '22 07:03 vietj

It's also very important that you redirect requests to the nginx endpoint.

Step 1: get the correct IP: ifconfig en0 inet and use that as your upstream IP (mine was 192.168.88.38) Step 2: copy this nginx.conf file:

map $http_upgrade $connection_upgrade {
    default Upgrade;
    '' close;
}

upstream logger {
    keepalive 10;

    server 192.168.88.38:8080;
}

server {
    listen 80 default_server;

    server_name _;

    charset UTF-8;

    location /ws {
        client_max_body_size 33m;
        proxy_set_header Host $http_host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_set_header Proxy "";

        proxy_http_version 1.1;
        proxy_connect_timeout 5s;
        proxy_send_timeout 60s;
        proxy_read_timeout 180s;
        proxy_buffering off;
        proxy_buffer_size 4k;
        proxy_buffers 4 4k;
        proxy_max_temp_file_size 1024m;
        proxy_request_buffering off;
        proxy_cookie_domain off;
        proxy_cookie_path off;

        proxy_pass http://logger;
    }

    location / {
        return 404;
    }

    access_log off;
    log_not_found off;
    error_log /var/log/nginx/error.log error;
}

Step 3: download nginx and run it - sample script:

-p 9090:80 \
-v $(pwd)/nginx.conf:/etc/nginx/conf.d/default.conf \
nginx:1.21.6

Step 4: Send requests to :9090, which will forward requests to nginx. Then nginx will reuse the connections between requests/clients

agapic avatar Mar 09 '22 18:03 agapic

can you provide a recap with all the precise intructions to reproduce it @agapic ?

vietj avatar Mar 10 '22 13:03 vietj

Yeah I will put together a new repo with complete code that includes NGINX and a README for you tomorrow

agapic avatar Mar 11 '22 06:03 agapic

thank you

On Fri, Mar 11, 2022 at 7:05 AM Andrew Gapic @.***> wrote:

Yeah I will put together a new repo with complete code that includes NGINX and a README for you tomorrow

— Reply to this email directly, view it on GitHub https://github.com/eclipse-vertx/vert.x/issues/4299#issuecomment-1064802671, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCTRP77QHSDOX7M4KOLU7LPB3ANCNFSM5P7GEK6Q . You are receiving this because you were assigned.Message ID: @.***>

vietj avatar Mar 11 '22 07:03 vietj

Here is the reproducer: https://github.com/agapic/vertx-nginx-websocket-repro

agapic avatar Mar 12 '22 00:03 agapic

"Now when client 2 connects to the service, the request handler appears to be re-using the request headers from client 1. " : can you elaborate why nginx will resend previous request headers ?

vietj avatar Mar 15 '22 09:03 vietj

when I try the reproducer I get:

julien@Juliens-MacBook-Pro vertx-nginx-websocket-repro % ./force-response.sh 200 ws://localhost:9090/ws
A848EC3A-9EF1-4A2A-8052-5EA8CEA9E386 looping 200 once-per-second
Connected (press CTRL+C to quit)

is that what is supposed to happen ?

I don't think so (the readme says it should hang)

vietj avatar Mar 15 '22 10:03 vietj

If you send a 403 first and then try to send a 200 it should hang. Keep-alive is set to 10s so you have a 10s window to do it. If you set keep-alive in the nginx.conf to say 60s it should be more easily reproducable

agapic avatar Mar 15 '22 17:03 agapic

"Now when client 2 connects to the service, the request handler appears to be re-using the request headers from client 1. " : can you elaborate why nginx will resend previous request headers ?

Nginx won't resend previous request headers, it sends the correct ones everytime. To rephrase - in 3.6.3 the websocket was re-using the old headers that were contained within the object rather than copying the new request headers over.

agapic avatar Mar 15 '22 17:03 agapic

the script will try to send several requests as I can see, so what would be the accurate way to reproduce the issue ?

vietj avatar Mar 15 '22 21:03 vietj

can you make a reproducer with a regular client like Vert.x instead of requiring two scripts to be executed ? it's not clear also how to observe the issue.

vietj avatar Mar 15 '22 22:03 vietj

Hi @vietj, I updated the code to connect to the websocket using the Httpclient class, everything is run on a periodic timer now. After running it you will notice you will stop receiving responses - i.e. logger.info("Received a 200 response"); in Client.java. If you change the port to 8081 (instead of 9090 in Client.java) you will always receive responses.

agapic avatar Mar 17 '22 01:03 agapic

@vietj were you able to get any more info from my update?

agapic avatar Mar 22 '22 19:03 agapic

not yet the time to get back to this

vietj avatar Mar 23 '22 08:03 vietj

@tobias I've run the reproducer for 3 minutes and I'm continusouly getting

14:50:41.993 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Attempting to connect to websocket and expecting a 403 response
14:50:41.993 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Attempting to connect to websocket and expecting a 200 response
14:50:41.996 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Received a 200 response
14:50:41.997 [vert.x-eventloop-thread-0] INFO com.agapic.vertx_reproducer.Client - Received a 403 successfully

is that expected ?

vietj avatar Mar 31 '22 12:03 vietj

Hi @vietj! I have no idea, I haven't looked at vert.x in > 5 years :) Is this related to some work I did back then?

tobias avatar Mar 31 '22 13:03 tobias

I meant that was I observe seems to work correctly, can you confirm it ?

vietj avatar Mar 31 '22 13:03 vietj

Well, I haven't touched vert.x in years, so I don't know what value there would be in me getting involved again.

tobias avatar Mar 31 '22 14:03 tobias

this was actually meant to @agapic sorry for the noise

vietj avatar Apr 01 '22 08:04 vietj

Ah, that makes sense @vietj. No worries!

tobias avatar Apr 01 '22 10:04 tobias