envoy icon indicating copy to clipboard operation
envoy copied to clipboard

ext_authz(http): Envoy turns AuthorizationResponse status 500 into 403 and drops body

Open yanniszark opened this issue 3 years ago • 21 comments

Title: ext_authz(http): Envoy turns AuthorizationResponse status 500 into 403 and drops body

Description:

  1. User makes an HTTP request against Envoy.
  2. Envoy makes AuthorizationRequest to ext_authz filter's http server.
  3. ext_authz server replies with 503 and a body.
  4. Envoy returns 403 and no body.

Repro steps: This came up as part of a project so it's easier to include instructions for that project:

# Clone and checkout branch made for reproducing this issue (see last commit)
git clone [email protected]:arrikto/oidc-authservice.git
cd oidc-authservice
git checkout feature-envoy-ext-authz-repo

# Run the E2E test to get a K3d cluster up with everything running.
# The test will fail, but it's expected.
make bin/deps
make e2e

# Port-forward the service locally
export KUBECONFIG=$HOME/.k3d/kubeconfig-e2e-test-cluster.yaml
kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80

# Make a request and see that it returns a 403
curl -v http://localhost:8080

Config: Here is the config dump taken from port 15000 of the istio ingressgateway proxy: config_dump.log

Logs: access_log.log

yanniszark avatar Nov 12 '20 19:11 yanniszark

I don't think this a bug, handling errors from the auth service service is either fail-open or fail-closed (default). https://github.com/envoyproxy/envoy/blob/2a252964ecd5e790aaac3a4d08d43bd3cad4b6c8/source/extensions/filters/http/ext_authz/ext_authz.cc#L292-L307

Proxying an auth service error to the client may just be a feature request. You can set the error status code with status_on_error if you want to return a 500 instead

derekargueta avatar Nov 13 '20 09:11 derekargueta

@derekargueta the documentation for status_on_error says:

Sets the HTTP status that is returned to the client when there is a network error between the filter and the authorization server. The default status is HTTP 403 Forbidden.

However, this is not the case. If I understand correctly, filter == Envoy and authorization server == my http authservice. In this case, there is no network error between them, but something went wrong in the authservice.

In addition, when I tried returning error code 400 Bad Request, the error code was not transformed into a 403 and the message was kept.

Which means this is a problem only with 500 (and maybe 5xx) http codes. In the ext_authz code, is the case of status 500 and network failure between envoy/authservice handled as the same case? If so, why? It seems weird to handle other codes (3xx, 4xx) correctly, but not 5xx codes.

yanniszark avatar Nov 14 '20 10:11 yanniszark

kind ping @derekargueta @mattklein123

yanniszark avatar Nov 25 '20 11:11 yanniszark

This issue https://github.com/envoyproxy/envoy/issues/4124 and the associated PR https://github.com/envoyproxy/envoy/pull/4199 introduced this behavior, I think.

esmet avatar Dec 10 '20 21:12 esmet

Thanks @esmet! I see this is triaged as a question and I don't think it's correct. @mattklein123 @derekargueta would you agree this is a bug, as described here: https://github.com/envoyproxy/envoy/issues/14001#issuecomment-727187082

The ext_authz filter needs to distinguish between a network error and getting back HTTP 500, in which case it could reach the AuthService successfully.

yanniszark avatar Dec 10 '20 21:12 yanniszark

@yanniszark I agree. It sounds like we want to add configuration to set the definition of an error for the external authorization server. Similar to https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/router_filter#x-envoy-retry-on but of course we can tweak it so that it makes the most sense for the ext_authz use case. What do you think?

@alyssawilk feel free to assign this one to me

esmet avatar Dec 10 '20 22:12 esmet

It sounds like we want to add configuration to set the definition of an error for the external authorization server

@esmet sorry, I'm not sure I understood that. IMO, all error returned by the AuthService should be surfaced to the downstream client. So if the AuthService returns 503 with a body, this response will be proxied back to the client, same as other non-2xx codes (3xx, 4xx).

yanniszark avatar Dec 10 '20 22:12 yanniszark

By definition of error, I mean the case where a call to the authorization service is considered "failed" as opposed to "completed, but with an error".

Today, the ext_authz filter considers "completed, but with a 5xx error" to be "failed" though one could argue that the call succeeded, it just came back with a 5xx denial. This would fit the use case that you are describing.

We could add configuration to allow you to specify what you consider a failure. It could be just network errors, or network errors + 5xx errors, or something else, sort of like the 'retry-on' logic I linked above.

The default value for this configuration would be the current behavior, which is network errors and 5xx codes. For your use case, you'd configure it to only consider network errors as "failures" (for which the response code to the client is the preconfigured error response code) and everything else would be considered successful communication to the auth service, such that 2XX means pass through and everything else means deny with the code, headers, and body provided by the auth service.

esmet avatar Dec 10 '20 22:12 esmet

@esmet got it! IMO the current behavior of

Today, the ext_authz filter considers "completed, but with a 5xx error" to be "failed"

is a bug, but if you feel it makes sense to keep as a feature then all the power to you!

yanniszark avatar Dec 10 '20 22:12 yanniszark

I agree that probably making the "definition of errors" to be configurable seems useful.

dio avatar Dec 11 '20 00:12 dio

@esmet got it! IMO the current behavior of

Today, the ext_authz filter considers "completed, but with a 5xx error" to be "failed"

is a bug, but if you feel it makes sense to keep as a feature then all the power to you!

I could see why someone would want 5XX errors to be considered failures and not legitimate responses to send downstream. For example, one might not want someone to know that their auth service is actually down (or dying) or if some proxy between Envoy and the auth service is unhealthy.

At any rate, the existing behavior is now canon so that's the primary reason we'd to keep it as the default. @dio do you think mimicking the retry-on configuration is the right path forward?

I looked into the code and was sort of surprised to see that the retry-on values (gateway-error, 5xx, etc...) is largely implemented in code (retry_state_impl.cc) as opposed to a proto API that we could just reuse.

esmet avatar Dec 11 '20 01:12 esmet

do you think mimicking the retry-on configuration is the right path forward?

I think so. Yes, it actually controlled by an array of pre-defined strings. Probably we can have something like: failed_on: ['gateway-error', '5xx']?. Thoughts?

dio avatar Dec 11 '20 06:12 dio

do you think mimicking the retry-on configuration is the right path forward?

I think so. Yes, it actually controlled by an array of pre-defined strings. Probably we can have something like: failed_on: ['gateway-error', '5xx']?. Thoughts?

Sounds right. I'll try this soon.

esmet avatar Dec 12 '20 04:12 esmet

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jan 11 '21 04:01 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar Jan 18 '21 04:01 github-actions[bot]

This is very much still an issue!

yanniszark avatar Jan 18 '21 10:01 yanniszark

Let's move this to "no stale bot"

esmet avatar Jan 18 '21 15:01 esmet

No new activity on this yet? it would be super useful to differentiate between network drops vs 5xx errors

ninadm avatar Jun 30 '21 21:06 ninadm

Is this one still on road map? +1 on "it would be super useful"

mszlgr avatar Nov 15 '21 16:11 mszlgr

there are people that still need this - for example we have a flow where we return 5xx codes but they are intended to be handled specially by some applications.

erplsf avatar May 24 '22 13:05 erplsf

hi, can I pick this up? Thanks!

imrenagi avatar Sep 19 '22 23:09 imrenagi

@imrenagi I think you can take it. Please let me know if you need help. Thanks!

cc. @ggreenway.

dio avatar Sep 23 '22 11:09 dio

@imrenagi, after re-reading https://github.com/envoyproxy/envoy/issues/14001#issuecomment-742852892. I think I agree with @yanniszark. So, we need to introduce runtime guarding, to deprecate the current behavior (with runtime guarding we allow users to fallback to the previous (current) behavior).

Basically, you need the following patch:

diff --git a/source/common/runtime/runtime_features.cc b/source/common/runtime/runtime_features.cc
index 36832d73f5..47e4297c58 100644
--- a/source/common/runtime/runtime_features.cc
+++ b/source/common/runtime/runtime_features.cc
@@ -43,6 +43,7 @@ RUNTIME_GUARD(envoy_reloadable_features_delta_xds_subscription_state_tracking_fi
 RUNTIME_GUARD(envoy_reloadable_features_do_not_await_headers_on_upstream_timeout_to_emit_stats);
 RUNTIME_GUARD(envoy_reloadable_features_do_not_count_mapped_pages_as_free);
 RUNTIME_GUARD(envoy_reloadable_features_enable_compression_bomb_protection);
+RUNTIME_GUARD(envoy_reloadable_features_ext_authz_http_service_5xx_is_denied);
 RUNTIME_GUARD(envoy_reloadable_features_fix_hash_key);
 RUNTIME_GUARD(envoy_reloadable_features_get_route_config_factory_by_type);
 RUNTIME_GUARD(envoy_reloadable_features_http2_delay_keepalive_timeout);
diff --git a/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc b/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc
index 3157136b43..a5d02c28dc 100644
--- a/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc
+++ b/source/extensions/filters/common/ext_authz/ext_authz_http_impl.cc
@@ -317,11 +317,14 @@ void RawHttpClientImpl::onBeforeFinalizeUpstreamSpan(
 ResponsePtr RawHttpClientImpl::toResponse(Http::ResponseMessagePtr message) {
   const uint64_t status_code = Http::Utility::getResponseStatus(message->headers());
 
-  // Set an error status if the call to the authorization server returns any of the 5xx HTTP error
-  // codes. A Forbidden response is sent to the client if the filter has not been configured with
-  // failure_mode_allow.
-  if (Http::CodeUtility::is5xx(status_code)) {
-    return std::make_unique<Response>(errorResponse());
+  if (!Runtime::runtimeFeatureEnabled(
+          "envoy.reloadable_features.ext_authz_http_service_5xx_is_denied")) {
+    // Set an error status if the call to the authorization server returns any of the 5xx HTTP error
+    // codes. A Forbidden response is sent to the client if the filter has not been configured with
+    // failure_mode_allow.
+    if (Http::CodeUtility::is5xx(status_code)) {
+      return std::make_unique<Response>(errorResponse());
+    }
   }
 
   // Extract headers-to-remove from the storage header coming from the

Hence with this, when this RawHttpClientImpl receives 5xx, it returns a response object with 5xx as the status code, with a non-empty body and all.

Note I put ext_authz_http_service_5xx_is_denied (meaning we are now classifying 5xx as a denied response, not an error one) as the name for the flag, feel free to choose a better name when you have it! (probably the longer version: ext_authz_http_service_5xx_is_denied_instead_of_error).

Please add unit and integration tests.

Good luck!

dio avatar Oct 28 '22 06:10 dio

Sure @dio . Will take a look and try to update my PR. thanks!

imrenagi avatar Oct 31 '22 15:10 imrenagi