envoy
envoy copied to clipboard
ext_authz(http): Envoy turns AuthorizationResponse status 500 into 403 and drops body
Title: ext_authz(http): Envoy turns AuthorizationResponse status 500 into 403 and drops body
Description:
- User makes an HTTP request against Envoy.
- Envoy makes AuthorizationRequest to ext_authz filter's http server.
- ext_authz server replies with 503 and a body.
- Envoy returns 403 and no body.
Repro steps: This came up as part of a project so it's easier to include instructions for that project:
# Clone and checkout branch made for reproducing this issue (see last commit)
git clone [email protected]:arrikto/oidc-authservice.git
cd oidc-authservice
git checkout feature-envoy-ext-authz-repo
# Run the E2E test to get a K3d cluster up with everything running.
# The test will fail, but it's expected.
make bin/deps
make e2e
# Port-forward the service locally
export KUBECONFIG=$HOME/.k3d/kubeconfig-e2e-test-cluster.yaml
kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80
# Make a request and see that it returns a 403
curl -v http://localhost:8080
Config: Here is the config dump taken from port 15000 of the istio ingressgateway proxy: config_dump.log
Logs: access_log.log
I don't think this a bug, handling errors from the auth service service is either fail-open or fail-closed (default). https://github.com/envoyproxy/envoy/blob/2a252964ecd5e790aaac3a4d08d43bd3cad4b6c8/source/extensions/filters/http/ext_authz/ext_authz.cc#L292-L307
Proxying an auth service error to the client may just be a feature request. You can set the error status code with status_on_error
if you want to return a 500 instead
@derekargueta the documentation for status_on_error says:
Sets the HTTP status that is returned to the client when there is a network error between the filter and the authorization server. The default status is HTTP 403 Forbidden.
However, this is not the case. If I understand correctly, filter == Envoy and authorization server == my http authservice. In this case, there is no network error between them, but something went wrong in the authservice.
In addition, when I tried returning error code 400 Bad Request, the error code was not transformed into a 403 and the message was kept.
Which means this is a problem only with 500 (and maybe 5xx) http codes. In the ext_authz code, is the case of status 500 and network failure between envoy/authservice handled as the same case? If so, why? It seems weird to handle other codes (3xx, 4xx) correctly, but not 5xx codes.
kind ping @derekargueta @mattklein123
This issue https://github.com/envoyproxy/envoy/issues/4124 and the associated PR https://github.com/envoyproxy/envoy/pull/4199 introduced this behavior, I think.
Thanks @esmet! I see this is triaged as a question and I don't think it's correct. @mattklein123 @derekargueta would you agree this is a bug, as described here: https://github.com/envoyproxy/envoy/issues/14001#issuecomment-727187082
The ext_authz
filter needs to distinguish between a network error and getting back HTTP 500, in which case it could reach the AuthService successfully.
@yanniszark I agree. It sounds like we want to add configuration to set the definition of an error for the external authorization server. Similar to https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/router_filter#x-envoy-retry-on but of course we can tweak it so that it makes the most sense for the ext_authz use case. What do you think?
@alyssawilk feel free to assign this one to me
It sounds like we want to add configuration to set the definition of an error for the external authorization server
@esmet sorry, I'm not sure I understood that. IMO, all error returned by the AuthService should be surfaced to the downstream client. So if the AuthService returns 503 with a body, this response will be proxied back to the client, same as other non-2xx codes (3xx, 4xx).
By definition of error, I mean the case where a call to the authorization service is considered "failed" as opposed to "completed, but with an error".
Today, the ext_authz filter considers "completed, but with a 5xx error" to be "failed" though one could argue that the call succeeded, it just came back with a 5xx denial. This would fit the use case that you are describing.
We could add configuration to allow you to specify what you consider a failure. It could be just network errors, or network errors + 5xx errors, or something else, sort of like the 'retry-on' logic I linked above.
The default value for this configuration would be the current behavior, which is network errors and 5xx codes. For your use case, you'd configure it to only consider network errors as "failures" (for which the response code to the client is the preconfigured error response code) and everything else would be considered successful communication to the auth service, such that 2XX means pass through and everything else means deny with the code, headers, and body provided by the auth service.
@esmet got it! IMO the current behavior of
Today, the ext_authz filter considers "completed, but with a 5xx error" to be "failed"
is a bug, but if you feel it makes sense to keep as a feature then all the power to you!
I agree that probably making the "definition of errors" to be configurable seems useful.
@esmet got it! IMO the current behavior of
Today, the ext_authz filter considers "completed, but with a 5xx error" to be "failed"
is a bug, but if you feel it makes sense to keep as a feature then all the power to you!
I could see why someone would want 5XX errors to be considered failures and not legitimate responses to send downstream. For example, one might not want someone to know that their auth service is actually down (or dying) or if some proxy between Envoy and the auth service is unhealthy.
At any rate, the existing behavior is now canon so that's the primary reason we'd to keep it as the default. @dio do you think mimicking the retry-on configuration is the right path forward?
I looked into the code and was sort of surprised to see that the retry-on values (gateway-error, 5xx, etc...) is largely implemented in code (retry_state_impl.cc) as opposed to a proto API that we could just reuse.
do you think mimicking the retry-on configuration is the right path forward?
I think so. Yes, it actually controlled by an array of pre-defined strings. Probably we can have something like: failed_on: ['gateway-error', '5xx']
?. Thoughts?
do you think mimicking the retry-on configuration is the right path forward?
I think so. Yes, it actually controlled by an array of pre-defined strings. Probably we can have something like:
failed_on: ['gateway-error', '5xx']
?. Thoughts?
Sounds right. I'll try this soon.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.
This is very much still an issue!
Let's move this to "no stale bot"
No new activity on this yet? it would be super useful to differentiate between network drops vs 5xx errors
Is this one still on road map? +1 on "it would be super useful"
there are people that still need this - for example we have a flow where we return 5xx codes but they are intended to be handled specially by some applications.
hi, can I pick this up? Thanks!
@imrenagi I think you can take it. Please let me know if you need help. Thanks!
cc. @ggreenway.
@imrenagi, after re-reading https://github.com/envoyproxy/envoy/issues/14001#issuecomment-742852892. I think I agree with @yanniszark. So, we need to introduce runtime guarding, to deprecate the current behavior (with runtime guarding we allow users to fallback to the previous (current) behavior).
Basically, you need the following patch:
diff --git a/source/common/runtime/runtime_features.cc b/source/common/runtime/runtime_features.cc
index 36832d73f5..47e4297c58 100644
--- a/source/common/runtime/runtime_features.cc
+++ b/source/common/runtime/runtime_features.cc
@@ -43,6 +43,7 @@ RUNTIME_GUARD(envoy_reloadable_features_delta_xds_subscription_state_tracking_fi
RUNTIME_GUARD(envoy_reloadable_features_do_not_await_headers_on_upstream_timeout_to_emit_stats);
RUNTIME_GUARD(envoy_reloadable_features_do_not_count_mapped_pages_as_free);
RUNTIME_GUARD(envoy_reloadable_features_enable_compression_bomb_protection);
+RUNTIME_GUARD(envoy_reloadable_features_ext_authz_http_service_5xx_is_denied);
RUNTIME_GUARD(envoy_reloadable_features_fix_hash_key);
RUNTIME_GUARD(envoy_reloadable_features_get_route_config_factory_by_type);
RUNTIME_GUARD(envoy_reloadable_features_http2_delay_keepalive_timeout);
diff --git a/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc b/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc
index 3157136b43..a5d02c28dc 100644
--- a/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc
+++ b/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc
@@ -317,11 +317,14 @@ void RawHttpClientImpl::onBeforeFinalizeUpstreamSpan(
ResponsePtr RawHttpClientImpl::toResponse(Http::ResponseMessagePtr message) {
const uint64_t status_code = Http::Utility::getResponseStatus(message->headers());
- // Set an error status if the call to the authorization server returns any of the 5xx HTTP error
- // codes. A Forbidden response is sent to the client if the filter has not been configured with
- // failure_mode_allow.
- if (Http::CodeUtility::is5xx(status_code)) {
- return std::make_unique<Response>(errorResponse());
+ if (!Runtime::runtimeFeatureEnabled(
+ "envoy.reloadable_features.ext_authz_http_service_5xx_is_denied")) {
+ // Set an error status if the call to the authorization server returns any of the 5xx HTTP error
+ // codes. A Forbidden response is sent to the client if the filter has not been configured with
+ // failure_mode_allow.
+ if (Http::CodeUtility::is5xx(status_code)) {
+ return std::make_unique<Response>(errorResponse());
+ }
}
// Extract headers-to-remove from the storage header coming from the
Hence with this, when this RawHttpClientImpl
receives 5xx
, it returns a response object with 5xx
as the status code, with a non-empty body and all.
Note I put
ext_authz_http_service_5xx_is_denied
(meaning we are now classifying5xx
as a denied response, not an error one) as the name for the flag, feel free to choose a better name when you have it! (probably the longer version:ext_authz_http_service_5xx_is_denied_instead_of_error
).
Please add unit and integration tests.
Good luck!
Sure @dio . Will take a look and try to update my PR. thanks!