emissary icon indicating copy to clipboard operation
emissary copied to clipboard

TLS handshake error: connection reset by peer or EOF

Open nagarajatantry opened this issue 4 years ago • 6 comments

Describe the bug During performance test, I have enabled ambassador pods and my upstream service to scale up when it breaches the 60% cpu threshold. When the scale up events are performed in both ambassador and upstream pods at the same time then i start seeing 503 errors with the below log message in my upstream service (Go). This does not happen when either ambassador or upstream service is pre-scaled.

2020/10/05 14:02:56 http: TLS handshake error from 100.122.153.167:56140: read tcp 100.99.240.5:9098->100.122.153.167:56140: read: connection reset by peer
2020/10/05 14:02:56 http: TLS handshake error from 100.122.153.167:58258: EOF

To Reproduce

  • Upstream Service exposing endpoint via https : Go service with a POST endpoint exposed. Nothing fancy. sleeps for few seconds and returns an empty body.
  • Install ambassador as mentioned in the documentation.
  • Ambassador Module and mapping setup is mentioned below

Expected behavior Scale up events without errors.

Versions (please complete the following information):

  • Ambassador: [1.7.3]
  • Kubernetes environment [in house]
  • Version [1.16]

Additional context I have tested with different setups.

  1. AWS ALB --> Ambassador Node Port --> Ambassador Pods --> Upstream NodePort --> Upstream Service
  2. AWS NLB --> Ambassador Pods --> Upstream NodePort --> Upstream Service
  3. AWS ALB --> Upstream NodePort --> Upstream Service (No AMbassador)

In case of 1 and 2, i see upwards of 10k (proportionate to the tps) 503 errors and the below error message in upstream logs . I dont see this issue when ambassador is not in the path (set up 3)

2020/10/05 14:02:56 http: TLS handshake error from 100.122.153.167:56140: read tcp 100.99.240.5:9098->100.122.153.167:56140: read: connection reset by peer
2020/10/05 14:02:56 http: TLS handshake error from 100.122.153.167:58258: EOF
  • Module & mapping
---
apiVersion: getambassador.io/v2
kind: Module
metadata:
  name: ambassador
spec:
  config:
    use_proxy_proto: true
    diag_port: 8878
    diagnostics:
      enabled: true  
    keepalive:
      time: 100
      interval: 10
      probes: 3    
---
apiVersion: getambassador.io/v2
kind: Mapping
metadata:
  labels:
    app: jaeger
    env: dev
  name: otel-mapping
  namespace: otel
spec:
  circuit_breakers:
  - max_connections: 1000000000
    max_pending_requests: 1000000000
    max_requests: 1000000000
  cors:
    credentials: true
    headers:
    - Content-Type
    - Authorization
    - Accept
    - x-opentelemetry-outgoing-request
    max_age: "86400"
    methods:
    - POST
    - GET
    - OPTIONS
    origins:
    - '*'
  grpc: false
  load_balancer:
    header: sessionid
    policy: ring_hash
  retry_policy:
    retry_on: "5xx"
    num_retries: 2
  prefix: /v1/trace
  resolver: endpoint
  rewrite: /v1/trace
  service: https://otelsvc:9098

nagarajatantry avatar Oct 05 '20 14:10 nagarajatantry