envoy
envoy copied to clipboard
Envoy running short lived WebSocket load starts returning codes 503
Envoy running short lived WebSocket load starts returning codes 503
Description: I have a WebSocket application using Envoy as the routing load balancer tool, and I'm doing some load tests to it to see how it behaves under load and what to expect on load scenarios from it. On high load with a persistent connection, I mean streaming messages through the same opened connection for 3 minutes I'm able to reach the 100% CPU in Envoy and measure the limits of requests I can handle but when I change to open the connection send a message and then after receiving an answer closing the connection, I start receiving errors 503 really fast instead of the expected 101 while trying to open new connections.
Repro steps: Using the following k6 script:
import ws from "k6/ws";
import { check } from "k6";
const msg = "test message";
export default function () {
const url = __ENV.WS_URL || "ws://localhost:8080/chat";
const params = { headers: { Host: "example.com" } };
const res = ws.connect(url, params, function (socket) {
socket.on("open", function open() {
socket.send(msg);
});
socket.on("message", (data) => {
socket.close();
});
socket.on("error", (e) => {
if (e.error() != "websocket: close sent") {
console.error("An unexpected error occurred: ", e.error());
}
});
});
const result = check(res, { "status is 101": (r) => r && r.status === 101 });
if (!result) {
console.error(`Check failed: status ${res.status}, body ${res.body}`);
}
}
Running it like this:
k6 run --vus 10 --duration 180s -e WS_URL=ws://mycompany/chat script.js
Starts returning 503 errors:
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: script-simple.js
output: -
scenarios: (100.00%) 1 scenario, 10 max VUs, 3m30s max duration (incl. graceful stop):
* default: 10 looping VUs for 3m0s (gracefulStop: 30s)
ERRO[0006] Check failed: status 503, body upstream connect error or disconnect/reset before headers. reset reason: connection timeout source=console
ERRO[0006] Check failed: status 503, body upstream connect error or disconnect/reset before headers. reset reason: connection timeout source=console
ERRO[0006] Check failed: status 503, body upstream connect error or disconnect/reset before headers. reset reason: connection timeout source=console
ERRO[0006] Check failed: status 503, body upstream connect error or disconnect/reset before headers. reset reason: connection timeout source=console
......
Looking at a tcp Dump in Envoy we can see this happening while there are some retransmission errors on closing the websocket connection:
Dump against the client
Dump against the backend
This is how a successful websocket call looks like against the back:
Config:
The config is just using the router filter and enables the websocket upgrade
....
"http_filters": [
{
"name": "envoy.filters.http.router",
"typed_config": {
"@type": "type.googleapis.com/envoy.extensions.filters.http.router.v3.Router"
}
}
],
....
....
"upgrade_configs": [
{
"upgrade_type": "websocket"
}
],
....
Logs:
[2024-09-19 20:58:36.656][3612][debug][conn_handler] [source/common/listener_manager/active_tcp_listener.cc:160] [Tags: "ConnectionId":"11628"] new connection from 168.63.129.16:56980
[2024-09-19 20:58:37.363][3614][debug][pool] [source/common/conn_pool/conn_pool_base.cc:803] [Tags: "ConnectionId":"10029"] connect timeout
[2024-09-19 20:58:37.363][3614][debug][connection] [source/common/network/connection_impl.cc:149] [Tags: "ConnectionId":"10029"] closing data_to_write=0 type=1
[2024-09-19 20:58:37.363][3614][debug][connection] [source/common/network/connection_impl.cc:281] [Tags: "ConnectionId":"10029"] closing socket: 1
[2024-09-19 20:58:37.364][3614][debug][client] [source/common/http/codec_client.cc:107] [Tags: "ConnectionId":"10029"] disconnect. resetting 0 pending requests
[2024-09-19 20:58:37.364][3614][debug][pool] [source/common/conn_pool/conn_pool_base.cc:495] [Tags: "ConnectionId":"10029"] client disconnected, failure reason:
[2024-09-19 20:58:37.364][3614][debug][router] [source/common/router/router.cc:1354] [Tags: "ConnectionId":"11612","StreamId":"8531618248070408344"] upstream reset: reset reason: connection timeout, transport failure reason:
[2024-09-19 20:58:37.364][3614][debug][http] [source/common/http/filter_manager.cc:1077] [Tags: "ConnectionId":"11612","StreamId":"8531618248070408344"] Sending local reply with details upstream_reset_before_response_started{connection_timeout}
[2024-09-19 20:58:37.364][3614][debug][http] [source/common/http/conn_manager_impl.cc:1838] [Tags: "ConnectionId":"11612","StreamId":"8531618248070408344"] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '91'
'content-type', 'text/plain'
'date', 'Thu, 19 Sep 2024 20:58:36 GMT'
'server', 'Microsoft-Azure-Application-LB/AGC'
'connection', 'close'
[2024-09-19 20:58:37.364][3614][debug][http] [source/common/http/conn_manager_impl.cc:1950] [Tags: "ConnectionId":"11612","StreamId":"8531618248070408344"] Codec completed encoding stream.
[2024-09-19 20:58:37.364][3614][debug][http] [source/common/http/conn_manager_impl.cc:252] [Tags: "ConnectionId":"11612","StreamId":"8531618248070408344"] doEndStream() resetting stream
[2024-09-19 20:58:37.364][3614][debug][http] [source/common/http/conn_manager_impl.cc:1907] [Tags: "ConnectionId":"11612","StreamId":"8531618248070408344"] stream reset: reset reason: local reset, response details: -
Is there something that I'm missing in my configuration? What can I do to prevent this from happening? Is this the normal expected behavior?
Thanks