proxy-wasm-rust-sdk icon indicating copy to clipboard operation
proxy-wasm-rust-sdk copied to clipboard

Panic when there are no remote hosts

Open erikness-doordash opened this issue 2 years ago • 3 comments

We get a panic in our Rust Envoy filter when we try to open a new gRPC stream to any hostname that DNS can’t find any hosts for. Reproduced with Envoy versions 1.25.2 and 1.28.2, and the latest Rust proxy-wasm version 0.2.1.

Panic happens when:

  1. we point the cluster config at a fake hostname (dns domain is that not resolvable)
  2. we point the cluster config at a DNS hostname that returns 0 healthy hosts (e.g. service scaled down to zero)

There is NOT a panic if there is a server found and it returns gRPC UNAVAILABLE.

Detailed order of events:

  1. We call Context.open_grpc_stream() from this library
  2. Proxy WASM calls proxy_grpc_stream() on the Envoy host
  3. Envoy host receives gRPC unavailable
  4. Proxy WASM receives Status::InternalFailure (host side code / proxy-wasm-rust-sdk code)
    1. As a result the stream’s token ID is not saved to the Proxy WASM Dispatcher.grpc_streams HashMap
  5. Proxy WASM gets a call from host to its method Dispatcher.on_grpc_receive_initial_metadata() with token number of the stream that couldn’t get created
  6. Token ID not in hashmap, PANIC

Similar issue: https://github.com/proxy-wasm/proxy-wasm-rust-sdk/issues/153

erikness-doordash avatar Aug 21 '23 23:08 erikness-doordash

PANIC SCENARIO LOGS

starting main dispatch loop
async DNS resolution complete for [HOSTNAME]
DNS refresh rate reset for [HOSTNAME], (failure) refresh rate 5000 ms
initializing Primary cluster [SERVICE_NAME] completed
init manager Cluster [SERVICE_NAME] contains no targets
init manager Cluster [SERVICE_NAME] initialized, notifying ClusterImplBase
adding TLS cluster [SERVICE_NAME]
cm init: init complete: cluster=[SERVICE_NAME] primary=0 secondary=0

[vm->host] env.proxy_grpc_stream(1053140, 17, 1053157, 55, 1053212, 21, 1225984, 68, 1047956)
[C0][S5454292189521443685] cluster '[SERVICE_NAME]' match for URL '[PROTOBUF_API_NAME]'
no healthy host for HTTP connection pool
async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'no healthy upstream'
[host->vm] proxy_on_grpc_receive_initial_metadata(1, 2, 0)
[vm->host] env.proxy_log(5, 1226064, 141)
wasm log [VM_NAME]: panicked at 'invalid token_id', /Users/[USER]/.cargo/registry/src/github.com-1ecc6299db9ec823/proxy-wasm-0.2.1/src/dispatcher.rs:431:27
[vm<-host] env.proxy_log return: 0
Function: proxy_on_grpc_receive_initial_metadata failed: Uncaught RuntimeError: unreachable
Proxy-Wasm plugin in-VM backtrace:
  0:  0xd1339 - __rust_start_panic
  1:  0xd10a2 - rust_panic
  2:  0xd1071 - _ZN3std9panicking20rust_panic_with_hook17h911ba172265603dfE
  3:  0xd0137 - _ZN3std9panicking19begin_panic_handler28_$u7b$$u7b$closure$u7d$$u7d$17h7b07dd737a4d8757E
  4:  0xd0061 - _ZN3std10sys_common9backtrace26__rust_end_short_backtrace17ha54585d732ac30f6E
  5:  0xd08ed - rust_begin_unwind
  6:  0xd657a - _ZN4core9panicking9panic_fmt17h6958314cbbe045b2E
  7:  0xdca87 - _ZN4core9panicking13panic_display17hc4d1a67c733e6371E
  8:  0xdc6e7 - _ZN4core9panicking9panic_str17h49f73175e64c8cc3E
  9:  0xdc6b5 - _ZN4core6option13expect_failed17hafbfcb5da25318c1E
item added to deferred deletion list (size=1)
[vm<-host] env.proxy_grpc_stream return: 10
[vm->host] env.proxy_log(2, 1225984, 70)
wasm log [VM_NAME]: grpc stream failed to open, error InternalFailure

erikness-doordash avatar Aug 21 '23 23:08 erikness-doordash

GOOD SCENARIO LOGS (server returns gRPC UNAVAILABLE, no panic)

starting main dispatch loop
async DNS resolution complete for [HOSTNAME]
transport socket match, socket default selected for host with address [IP_ADDR]:[PORT]
DNS hosts have changed for [HOSTNAME]
initializing Primary cluster [SERVICE_NAME] completed
init manager Cluster [SERVICE_NAME] contains no targets
init manager Cluster [SERVICE_NAME] initialized, notifying ClusterImplBase
adding TLS cluster [SERVICE_NAME]
membership update for TLS cluster [SERVICE_NAME] added 1 removed 0
cm init: init complete: cluster=[SERVICE_NAME] primary=0 secondary=0

[vm->host] env.proxy_grpc_stream(1053140, 17, 1053157, 55, 1053212, 21, 1225984, 68, 1047956)
[C0][S5073346782053791857] cluster '[SERVICE_NAME]' match for URL '[PROTOBUF_API_NAME]'
[C0][S5073346782053791857] router decoding headers:
':method', 'POST'
':path', '[PROTOBUF_API_NAME]'
':authority', '[SERVICE_NAME]'
':scheme', 'http'
'te', 'trailers'
'content-type', 'application/grpc'
'x-envoy-internal', 'true'
'x-forwarded-for', '[IP_ADDR_2]'
queueing stream due to no available connections
trying to create new connection
ConnPoolImplBase 0x5d1be7fed80, ready_clients_.size(): 0, busy_clients_.size(): 0, connecting_clients_.size(): 0, connecting_stream_capacity_: 0, num_active_streams_: 0, pending_streams_.size(): 1 per upstream preconnect ratio: 1
creating a new connection
[C0] connecting
[C0] connecting to [IP_ADDR]:[PORT]
[C0] connection in progress
Codec does not have Metadata frame support.
[C0] updating connection-level initial window size to 268435456
not creating a new connection, shouldCreateNewConnection returned false.
[vm<-host] env.proxy_grpc_stream return: 0
[vm->host] env.proxy_log(2, 1225984, 45)
wasm log [VM_NAME]: grpc stream opened, stream_id=2

[vm->host] env.proxy_grpc_send(2, 1225312, 11, 0)
[C0][S5073346782053791857] buffering 16 bytes
[vm<-host] env.proxy_grpc_send return: 0

[C0][S5073346782053791857] upstream headers complete: end_stream=true
[C0][S5073346782053791857] resetting pool request
async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'not available'

wasm log [VM_NAME]: on_grpc_stream_close(): stream_id=2, status_code 14

erikness-doordash avatar Aug 21 '23 23:08 erikness-doordash

放出来代码,看看,有可能是代码写法的问题

pi-pi-miao avatar Aug 24 '24 06:08 pi-pi-miao