envoy
envoy copied to clipboard
WASM Calling `proxy_send_local_response` twice will stuck remote http client(e.g. curl) forever until timeout or interrupted
Title: WASM Calling proxy_send_local_response twice will stuck remote http client(e.g. curl) forever until timeout or interrupted
Description:
I have a wasm running on envoy, and I found http request will be stuck if I call proxy_send_local_response both in on_http_request_headers and on_http_response_headers.
Repro steps: Here is the shortest wasm code:
use log::info;
use proxy_wasm::traits::*;
use proxy_wasm::types::*;
proxy_wasm::main! {{
proxy_wasm::set_log_level(LogLevel::Debug);
proxy_wasm::set_root_context(|_| -> Box<dyn RootContext> { Box::new(HttpAuthProxyRoot{})});
}}
struct HttpAuthProxyRoot {}
impl RootContext for HttpAuthProxyRoot {
fn create_http_context(&self, context_id: u32) -> Option<Box<dyn HttpContext>> {
Some(Box::new(HttpAuthProxy::new(context_id)))
}
fn get_type(&self) -> Option<ContextType> {
Some(ContextType::HttpContext)
}
}
impl Context for HttpAuthProxyRoot {}
#[derive(Default)]
struct HttpAuthProxy {
context_id: u32,
}
impl HttpAuthProxy {
fn new(context_id: u32) -> HttpAuthProxy {
HttpAuthProxy { context_id }
}
fn test_response(&self) -> Action {
self.send_http_response(401, vec![], None);
Action::Continue
}
}
impl HttpContext for HttpAuthProxy {
fn on_http_request_headers(&mut self, _num_headers: usize, _end_of_stream: bool) -> Action {
let ret = self.test_response();
info!("on_http_request_headers done");
ret
}
fn on_http_response_headers(&mut self, _num_headers: usize, _end_of_stream: bool) -> Action {
let ret = self.test_response();
info!("on_http_response_headers done");
ret
}
fn on_log(&mut self) {
info!("#{} completed", self.context_id);
}
}
impl Context for HttpAuthProxy {}
envoy docker-compose configuration:
services:
envoy:
image: envoyproxy/envoy:v1.24.0
command: -c /etc/envoy/envoy.yaml -l debug
hostname: envoy
ports:
- "10000:10000"
volumes:
- ./envoy.yaml:/etc/envoy/envoy.yaml
- ./target/wasm32-wasi/release:/etc/envoy/proxy-wasm-plugins
networks:
- envoymesh
networks:
envoymesh: {}
and envoy.yaml:
admin:
address:
socket_address:
protocol: TCP
address: 0.0.0.0
port_value: 15000
static_resources:
clusters:
- name: test_cluster
type: STATIC
connect_timeout: 0.250s
load_assignment:
cluster_name: test_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 15000
listeners:
address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_routes
virtual_hosts:
- name: local_service
domains:
- "*"
routes:
- match:
prefix: "/"
direct_response:
status: 200
body:
inline_string: "Request /hello and be welcomed!\n"
http_filters:
- name: envoy.filters.http.wasm
typed_config:
"@type": type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
value:
config:
name: "http_headers"
vm_config:
runtime: "envoy.wasm.runtime.v8"
code:
local:
filename: "/etc/envoy/proxy-wasm-plugins/plugin.wasm"
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
Logs:
And here is log on client side:
I found out that it's caused by this PR https://github.com/envoyproxy/envoy/pull/23049
It occurs since envoy 1.24+ (also Istio 1.16+). If I revert these codes and build, it works again.
I don't know if this behaviour is expected or changed by accident.
It only occurs in my QA cluster (with a higher version of envoy, 1.24+)
@mpwarres
I can tackle this
/assign @keithmattix
/assign @krinkinmu
krinkinmu is not allowed to assign users.
/assign @krinkinmu
@krinkinmu cannot be assigned to this issue.
@alyssawilk any idea what the issue is here?
I was able to reproduce the issue on a recent head build and with C++ based filter. Going through the issue with debugger suggest that roughly the following is happening:
- onRequestHeaders triggers local reply
- local reply goes throgh the filter chain
- it triggers onResponseHeaders that also attempts to send a local reply again, while we are still processing the previous local reply
- Second local reply request gets ignored (as expected), but because the plugin tries to trigger a local reply, proxy wasm pauses filter chain processing (because that's how it always reacts to local replies, see https://github.com/proxy-wasm/proxy-wasm-cpp-host/blob/3212034ee6e75d54cbf91b65cc659772166442f4/src/exports.cc#L158 and https://github.com/proxy-wasm/proxy-wasm-cpp-host/blob/3212034ee6e75d54cbf91b65cc659772166442f4/src/context.cc#L513-L519) and at that point Envoy get stuck.
As for the fix, I think the filter itself can avoid triggering a second local reply if it's processing a local reply already. That being said, I don't think that Envoy hanging in this case is desirable behavior, regardless of the filter logic, so I think Envoy could do better in this case.
I see a couple of options:
- Given that second local reply was ignored, maybe we should just avoid pausing filter chain processing in this case, given that we didn't actually trigger a local send reply
- Actually trigger the second local reply and not ignore it.
NOTE: mostly for myself, a third option would be to not send local replies through the filter chain, but I don't think we should change the behavior that was around for years so I'm dismissing this option right away.
I find option 2 rather confusing, what is the desired behaviour be in this case? The second local reply overwrites the first one? Because of this I prefer option 1, so I'm starting to work on the code change to implement option 1.
One more thing to mention, the fix should be on the proxy-wasm-cpp-host side, I think.
/assign @krinkinmu
krinkinmu is not allowed to assign users.
@alyssawilk any idea what the issue is here?
@keithmattix I thought that it was a propagation delay of some sort, but it does not seem to be the case anymore. Looking at what repokitteh does, I think that it is supposed to just call github API (see https://github.com/repokitteh/modules/blob/master/assign.star#L8-L10 and https://docs.github.com/en/rest/issues/assignees?apiVersion=2022-11-28#check-if-a-user-can-be-assigned).
However, running the API check locally works for me:
$ gh api \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
/repos/envoyproxy/envoy/assignees/krinkinmu
$ echo $?
0
and for the reference, this is how I think it looks when I don't have permissions
$ gh api \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
/repos/proxy-wasm/proxy-wasm-cpp-sdk/assignees/krinkinmu
{
"message": "Not Found",
"documentation_url": "https://docs.github.com/rest/issues/assignees#check-if-a-user-can-be-assigned",
"status": "404"
}
gh: Not Found (HTTP 404)
$ echo $?
1
I'm confused why it does not work, but maybe we should discuss the repo access issues outside of this bug?
I prepared a fix for proxy-wasm-host-cpp that resolves the issue: https://github.com/proxy-wasm/proxy-wasm-cpp-host/pull/423. Will see if we can get consensus on the approach and get it merged there.
/assign @krinkinmu