aws-kms-xks-proxy icon indicating copy to clipboard operation
aws-kms-xks-proxy copied to clipboard

Timeout on many requests

Open sysroute0 opened this issue 1 year ago • 0 comments

Hello,

When we start generating thousands of KMS requests to XKS proxy, the following logs start appearing in xks-proxy.log and timeout message saying the request can't be executed: 2024-02-23T11:06:54.285096Z DEBUG tokio-runtime-worker hyper::proto::h1::conn: 742: error shutting down IO: Transport endpoint is not connected (os error 107) 2024-02-23T11:06:54.285134Z DEBUG tokio-runtime-worker hyper::proto::h1::conn: 742: error shutting down IO: Transport endpoint is not connected (os error 107) 2024-02-23T11:06:54.285168Z DEBUG tokio-runtime-worker hyper::proto::h1::conn: 742: error shutting down IO: Transport endpoint is not connected (os error 107) 2024-02-23T11:06:54.285202Z DEBUG tokio-runtime-worker hyper::proto::h1::conn: 742: error shutting down IO: Transport endpoint is not connected (os error 107) 2024-02-23T11:06:54.285233Z DEBUG tokio-runtime-worker hyper::proto::h1::conn: 742: error shutting down IO: Transport endpoint is not connected (os error 107) 2024-02-23T11:06:54.285266Z DEBUG tokio-runtime-worker hyper::proto::h1::conn: 742: error shutting down IO: Transport endpoint is not connected (os error 107)

We are running xks-proxy on a EC2 instance which is powerful enough and when we check the monitoring it's not loaded at all. When we load balance the requests behind a NLB and many xks-proxies behind it, this problem doesn't exist. Keep in mind we start many docker containers on the same EC2 instance and load balance between them, which proves the issue is not related with overload of the ec2 itself.

Do you have any ideas what could be the reason for that. Here's the configuration we use in settings.toml: `[server] ip = "0.0.0.0" port = 8000 region = "eu-west-1" service = "kms-xks-proxy"

[server.tcp_keepalive] tcp_keepalive_secs = 60 tcp_keepalive_retries = 3 tcp_keepalive_interval_secs = 1

[tracing] is_stdout_writer_enabled = true is_file_writer_enabled = true level = "DEBUG" directory = "/var/local/xks-proxy/logs" file_prefix = "xks-proxy.log" rotation_kind = "HOURLY"

[security] is_sigv4_auth_enabled = true is_tls_enabled = false is_mtls_enabled = false

[tls] tls_cert_pem = "/var/local/xks-proxy/tls/server_cert.pem" tls_key_pem = "/var/local/xks-proxy/tls/server_key.pem"

[[external_key_stores]] uri_path_prefix = "/keys" sigv4_access_key_id = "XXXXXXXXXXXXXX" sigv4_secret_access_key = "XXXXXXXXXXX" xks_key_id_set = ["abc123", "xyz123"]

[pkcs11] session_pool_max_size = 300 session_pool_timeout_milli = 45 session_eager_close = false user_pin = "" PKCS11_HSM_MODULE = "/usr/local/lib/libvault-pkcs11.so" context_read_timeout_milli = 100

[limits] max_plaintext_in_base64 = 8192 max_aad_in_base64 = 16384

[hsm_capabilities] can_generate_iv = false is_zero_iv_required = false`

Please let me know if you need more information.

Thank you!

sysroute0 avatar Mar 27 '24 10:03 sysroute0