emissary
emissary copied to clipboard
TLS handshake error: connection reset by peer or EOF
Describe the bug During performance test, I have enabled ambassador pods and my upstream service to scale up when it breaches the 60% cpu threshold. When the scale up events are performed in both ambassador and upstream pods at the same time then i start seeing 503 errors with the below log message in my upstream service (Go). This does not happen when either ambassador or upstream service is pre-scaled.
2020/10/05 14:02:56 http: TLS handshake error from 100.122.153.167:56140: read tcp 100.99.240.5:9098->100.122.153.167:56140: read: connection reset by peer
2020/10/05 14:02:56 http: TLS handshake error from 100.122.153.167:58258: EOF
To Reproduce
- Upstream Service exposing endpoint via https : Go service with a POST endpoint exposed. Nothing fancy. sleeps for few seconds and returns an empty body.
- Install ambassador as mentioned in the documentation.
- Ambassador Module and mapping setup is mentioned below
Expected behavior Scale up events without errors.
Versions (please complete the following information):
- Ambassador: [1.7.3]
- Kubernetes environment [in house]
- Version [1.16]
Additional context I have tested with different setups.
- AWS ALB --> Ambassador Node Port --> Ambassador Pods --> Upstream NodePort --> Upstream Service
- AWS NLB --> Ambassador Pods --> Upstream NodePort --> Upstream Service
- AWS ALB --> Upstream NodePort --> Upstream Service (No AMbassador)
In case of 1 and 2, i see upwards of 10k (proportionate to the tps) 503 errors and the below error message in upstream logs . I dont see this issue when ambassador is not in the path (set up 3)
2020/10/05 14:02:56 http: TLS handshake error from 100.122.153.167:56140: read tcp 100.99.240.5:9098->100.122.153.167:56140: read: connection reset by peer
2020/10/05 14:02:56 http: TLS handshake error from 100.122.153.167:58258: EOF
- Module & mapping
---
apiVersion: getambassador.io/v2
kind: Module
metadata:
name: ambassador
spec:
config:
use_proxy_proto: true
diag_port: 8878
diagnostics:
enabled: true
keepalive:
time: 100
interval: 10
probes: 3
---
apiVersion: getambassador.io/v2
kind: Mapping
metadata:
labels:
app: jaeger
env: dev
name: otel-mapping
namespace: otel
spec:
circuit_breakers:
- max_connections: 1000000000
max_pending_requests: 1000000000
max_requests: 1000000000
cors:
credentials: true
headers:
- Content-Type
- Authorization
- Accept
- x-opentelemetry-outgoing-request
max_age: "86400"
methods:
- POST
- GET
- OPTIONS
origins:
- '*'
grpc: false
load_balancer:
header: sessionid
policy: ring_hash
retry_policy:
retry_on: "5xx"
num_retries: 2
prefix: /v1/trace
resolver: endpoint
rewrite: /v1/trace
service: https://otelsvc:9098