teleport icon indicating copy to clipboard operation
teleport copied to clipboard

Websocket Connection Freezes with some L7 TLS-Terminating Proxies

Open programmerq opened this issue 6 months ago • 1 comments

Expected Behavior

The tsh client should successfully establish and maintain a websocket connection through any L7 TLS-terminating proxy when connecting to a Teleport Cluster with TLS multiplexing enabled. The connection should proceed without freezing or timing out.

Current Behavior

When using some L7 TLS-terminating proxies, such as Spring Cloud Gateway, the websocket connection initiated by a Teleport client (tsh, agent, plugin, etc.) freezes after the initial handshake. Specifically:

  1. The client initiates the websocket connection.
  2. The Teleport server responds with a 101 Switching Protocols message.
  3. The first websocket frame is passed.
  4. The connection freezes, and the client eventually times out and closes the connection after 20 seconds.

With Spring Cloud Gateway in debug/verbose mode, its logs indicate that it is loading a websocket frame aggregator, websocket encoder, and websocket decoder. I believe that there is something about Teleport's websocket approach that makes the aggregator think that there's another frame that it needs to build the complete message before sending it along.

During the course of troubleshooting in my lab, I found that running tsh behind an mitmproxyweb instance triggers the same "freezing" behavior as seen with Spring Cloud Gateway. While that isn't an L7 reverse proxy, it is still an L7 TLS-terminating proxy that has special handling for websocket frames.

Bug Details

Teleport Version

  • Observed in Teleport versions 15.x and 16.x.

Recreation Steps

  1. Setup 1:

    • Deploy a self-hosted Teleport Cluster with TLS multiplexing enabled or use a Teleport Cloud tenant.
    • On the client side, use mitmproxyweb to intercept the traffic.
    • Attempt to establish a connection using the tsh client.
    export HTTPS_PROXY=http://127.0.0.1:8080
    tsh logout
    tsh login -d --proxy teleport.example.com:443
    
  2. Setup 2:

    • Deploy a self-hosted Teleport Cluster with TLS multiplexing enabled.
    • Configure a Spring Cloud Gateway as an L7 reverse proxy between the tsh client and the Teleport Cluster, with TLS terminated by the gateway. (See the attached configuration snippet below.)
    • Attempt to establish a connection using the tsh client.
  3. Observe that the connection freezes after the initial websocket handshake and the tsh client times out.

Debug Logs

  • Debug logs from the tsh client indicate that the websocket connection is initiated, but the connection freezes after receiving the 101 Switching Protocols response.
% tsh login -d ...
<snip>
2024-08-21T16:03:42-05:00 DEBU             Performing ALPN WebSocket connection upgrade. hostname:teleport.example.com:443 client/alpn_conn_upgrade.go:278
2024-08-21T16:03:42-05:00 DEBU             Performing ALPN WebSocket connection upgrade. hostname:teleport.example.com:443 client/alpn_conn_upgrade.go:278
    ERROR REPORT:
        Original Error: trace.aggregate connection error: desc = &#34;transport: authentication handshake failed: context deadline exceeded&#34;

The error appears after approximately 20 seconds. The L7 proxy seeds the client close the connection, and then closes its connection to the Teleport Proxy (as seen based on packet captures at each level)

  • Spring Cloud Gateway debug information shows that the proxy inspects the websocket frames, leading to the connection freezing.

Spring Cloud Gateway Configuration Snippet:

server:
  forward-headers-strategy: NATIVE
  error:
    whitelabel:
      enabled: false

teleport:
  enabled: true
  base-url: https://teleport-internal:3080 # this is the internal/upstream Proxy endpoint
  host: teleport.example.com:443 # this is the external facing DNS name that points to Spring Cloud Gateway

spring.cloud.gateway:
  x-forwarded.for-append: false

Spring Cloud Gateway DEBUG logs:

Click to Expand

(these were taken from a screenshot and run through OCR)

2024-08-26 17:36:07.896 DEBUG 18954 -—- [ctor—-http-nio-2] r.n.r.DefaultPooledConnectionProvider : [c2c@a3f5-1, L:/10.111.99.222:38858 — R:teleport.internal/10.111.22.99:3080] onStateChange(ws{uri=/webapi/connectionupgrade, connection=PooledConnection{channel=[ id: @xc2c@a3f5, L:/10.111.99.222:38858 - R:teleport.internal/10.111.22.99:3080]}}, [response_received] )
2024-08-26 17:36:07.897 DEBUG 18954 -— [ctor-http-nio-2] o.s.w.r.s.a.ReactorNettyWebSocketSession : [24709cbd] Session id "24709cbd" for //teleport.internal: 3080/webapi/connectionupgrade
2024-08-26 17:36:07.897 DEBUG 18954 --- [ctor-http-nio-2] o.s.w.r.s.c.ReactorNettyWebSocketClient : Started session '24709cbd' for [—M://teleport.internal:3080/webapi/connectionupgrade
2024-08-26 17:36:07.905 DEBUG 18954 --- [ctor-http-nio-2] reactor.netty.ReactorNetty : [6b0b18c1-4, L:/10.111.99.222:8080 —- R:/10.222.0.55:42540] Added decoder [reactor.left.wsFrameAggregator] at the end of the user pipeline, full pipeline: [wsencoder, wsdecoder, HttpServerTracingHandler#0, MaybeBlock ResponseHandler#, reactor.left.wsFrameAggregator, reactor.right.reactiveBridge, DefaultChannelPipeline$TailContext#0
2024-08-26 17:36:07.907 DEBUG 18954 -—- [ctor—-http-nio-2] reactor.netty.ReactorNetty : [c2c0a3f5-1, L:/10.111.99.222:38858 - R:teleport.internal10.111.22.99:3080] Added decoder [reactor.left.wsFrameAggregator] at the end of the user pipeline, full pipeline: [reactor.left.sslHandler, reactor.left.httpCodec, ws-decoder, ws-encoder, HttpClientTracingHandler#0, reactor.left.wsFrameAggregator, reactor.right.reactiveBridge, DefaultChannelPipeline$TailContext#@
2024-08-26 17:36:07.922 DEBUG 18954 ——- [ctor-http-nio-2] reactor.netty.channel.FluxReceive : [6b0b18c1-4, L:/10.111.99.222:8080 — R:/10.222.0.55:42540] [terminated=false, cancelled=false, pending=1, error=null]: subscribing inbound receiver
2024-08-26 17:36:07.925 DEBUG 18954 -—- [ctor-http-nio-2] reactor.netty.channel.FluxReceive : [c2c0a3f5-1, L:/10.111.99.222:38858 — R:teleport.internal:3080] [terminated=false, cancelled=false, pending=@, error=null]: subscribing inbound receiver

Specifically, the full pipeline: [reactor.left.sslHandler, reactor.left.httpCodec, ws-decoder, ws-encoder, HttpClientTracingHandler#0, reactor.left.wsFrameAggregator, reactor.right.reactiveBridge, DefaultChannelPipeline$TailContext#@ portion of the log is what suggested to me that the L7 proxy was trying to interpret/decode/reencode the websocket frames. The "aggregator" component suggest it might be trying to collect all the parts of a fragmented message before passing it along.

Wireshark Analysis

Click to Expand

While troubleshooting, I decrypted in-flight TLS traffic from a working environment so I could view the websocket messages in Wireshark. It seemed like Wireshark had a bit of a hard time parsing the Websocket frames. It showed several that were using opcodes that are reserved. On further examination, I found an example of every possible OpCode: 0x00-0xFF. I couldn't find any evidence in the Teleport code that shows that we use any non-standard OpCodes. This makes me wonder whether the Wireshark parser is having a similar issue parsing the frames. A parsing error could explain a pause if it thinks there is more data to be read.

tshark -r decrypted.pcap -O websocket | grep -i ' = Opcode: ' | sort | uniq
    .... 0000 = Opcode: Continuation (0)
    .... 0001 = Opcode: Text (1)
    .... 0010 = Opcode: Binary (2)
    .... 0011 = Opcode: Unknown (3)
    .... 0100 = Opcode: Unknown (4)
    .... 0101 = Opcode: Unknown (5)
    .... 0110 = Opcode: Unknown (6)
    .... 0111 = Opcode: Unknown (7)
    .... 1000 = Opcode: Connection Close (8)
    .... 1001 = Opcode: Ping (9)
    .... 1010 = Opcode: Pong (10)
    .... 1011 = Opcode: Unknown (11)
    .... 1100 = Opcode: Unknown (12)
    .... 1101 = Opcode: Unknown (13)
    .... 1110 = Opcode: Unknown (14)
    .... 1111 = Opcode: Unknown (15)
Internet Protocol Version 4, Src: gamma.example.com (127.0.0.1), Dst: gamma.example.com (127.0.0.1)
Transmission Control Protocol, Src Port: 51156, Dst Port: 80, Seq: 2821, Ack: 4781, Len: 39
[3 Reassembled TCP Segments (45 bytes): #69(16), #73(4), #75(25)]
WebSocket
    1... .... = Fin: True
    .010 .... = Reserved: 0x2
    .... 1010 = Opcode: Pong (10)
    0... .... = Mask: False
    .010 1011 = Payload length: 43
    Payload
        Pong: 90273caf82245a84961b61e12fc9000000271703030022124d506f71927dbe59536142b6c1ab63455945f3
WebSocket
    1... .... = Fin: True
    .000 .... = Reserved: 0x0
    .... 1101 = Opcode: Unknown (13)
    0... .... = Mask: False
    .000 1000 = Payload length: 8
    Payload
        Unknown: 90f5be84cc43e59c
            [Expert Info (Note/Undecoded): Dissector for Websocket Opcode (13) code not implemented, Contact Wireshark developers if you want this supported]
                [Dissector for Websocket Opcode (13) code not implemented, Contact Wireshark developers if you want this supported]
                [Severity level: Note]
                [Group: Undecoded]

programmerq avatar Aug 27 '24 21:08 programmerq