gloo
gloo copied to clipboard
Gloo is not handling the 302 properly from Lambda upstream
Gloo Edge Product
Enterprise
Gloo Edge Version
v1.15.1
Kubernetes Version
v1.20
Describe the bug
- REST API client calling Gloo API Gateway, which then invokes Lambda upstream
- The lambda function sends a 302 Redirect response to a new URL to Gloo
- Gloo API gateway returns HTTP 500 to API client.
Gateway proxy debug logs show:
[2023-11-06 17:44:46.012][87][debug][filter] [external/envoy_gloo/source/extensions/transformers/aws_lambda/api_gateway_transformer.cc:56] [C1417481][S11811468931835387921] Transforming response
[2023-11-06 17:44:46.012][87][debug][filter] [external/envoy_gloo/source/extensions/transformers/aws_lambda/api_gateway_transformer.cc:68] [C1417481][S11811468931835387921] Error transforming response: [json.exception.type_error.302] type must be string, but is null
[2023-11-06 17:44:46.012][87][debug][filter] [external/envoy_gloo/source/extensions/transformers/aws_lambda/api_gateway_transformer.cc:36] [C1417481][S11811468931835387921] Returning error with message: Failed to transform response
Expected Behavior
The client should receive the valid HTTP 302 response to new location.
Steps to reproduce the bug
- REST API client calling Gloo API Gateway, which then invokes Lambda upstream
- The lambda function sends a 302 Redirect response to a new URL to Gloo
- Gloo API gateway returns HTTP 500 to API client.
Additional Environment Detail
No response
Additional Context
More context located in ticket, will link once created.
┆Issue is synchronized with this Asana task by Unito
Zendesk ticket #2913 has been linked to this issue.
We should consider special casing status codes for transformation special cases
As discussed, the 302 issue needs to be fixed in the Lambda transformer itself, as this is a use-case we should support ootb.
To be able to handle other cases as well, we should consider implementing a FallbackTransformationProcessor that could use Inja templates to process any unexpected responses coming from Lambdas: #9143
Some clarification: The error is not related to an HTTP 302 redirect response. [json.exception.type_error.302] type must be string, but is null
is the error that nlohmann/json
, returns when a user tries to access a field that doesn't exist in a json object. See the discussion here for an example: https://github.com/nlohmann/json/discussions/3510
Kickoff Agenda
-
Issue details to clarify
- What client(s) have been affected by this issue?
- What AWS auth scheme do they use?
- Are they able to work around the issue?
- Do some requests succeed? Or are all requests affected by the issue?
- Can we get the source of the lambda functions that were used when the client experienced this issue?
- Under what circumstances does the bug occur?
- I may be wrong but this doesn't seem to be related to an HTTP 302.
-
nlohmann::json
(the JSON package used in envoy-gloo/envoy-gloo-ee to parse JSON from lambda function repsonses) returns[json.exception.type_error.302]
when a user tries to access a missing key, or when they assume that the value at a particular key is of an incorrect type- see discussion here for more details: https://github.com/nlohmann/json/discussions/3510#discussioncomment-2840598
- What versions does this need to be fixed in?
- it appears that we need this fix in Gloo EE v1.17.x - v1.15.x
- Do we need to and/or should we consider backporting this to v1.14.x?
- What client(s) have been affected by this issue?
-
timeline:
- implementing fixes and writing tests should go quickly once I am able to replicate the bug
- that being said, we will need to merge 4 PRs for each LTS version that this fix goes into.
- there are a limited number of edge-team reviewers, each of whom is wrapped up in IC work, meetings, and other reviews.
- It will be time consuming to get reviews on each of these PRs, even backport PRs which just port already-reviewed functionality to older LTS branches
- these PRs need to be approved, merged, updated, and released all in a given sequence. This process is extremely serial -- each step for each PR depends on other steps having been achieved in the proper order for other PRs. each step is time consuming and requires negotiation
- Let's plan for a timeboxed investigation into the bug. We can discuss how long this should be, but we can't make much progress until we can consistently replicate the bug
- ~3 days: investigate and replicate the bug
- I am wrapped up in many reviews for complex additions to edge. I could replicate this bug faster if I had more free time, but I do not.
- At the time of writing, this is the most uncertain piece of this timeline
- 2 days: write fix and data-plane tests against the fix
- based on the errors we're seeing, the fix should only require a couple of lines of changed code
- It should be straightforward to write data-plane tests to confirm that we're no longer susceptible to this sort of issue
- I originally had this listed as one day, but the complexities related to the split b/w enterprise and open source behavior in the lambda filter complicate this
- 2 days: write control plane e2e tests against the fix
- I would say 1 day, but I expect that I will need to:
- write e2e tests in both gloo and solo-projects, as a consequence of the strategies that we took to open source a certain amount of previously enterprise lambda filter functionality
- potentially add new lambda functions to the developers account to be used in testing
- If I end up writing kube2e tests, this will take longer
- I would say 1 day, but I expect that I will need to:
- 2 days: solicit reviews, respond to feedback, and merge v1.17.x data-plane PRs
- the timing of this heavily depends on the availability of reviewers and the scope of the suggested changes. I will design the PRs to be as incisive as possible in order to reduce the review time, but there is uncertainty here
- 3 days: solicit reviews, respond to feedback, and merge v1.17.x control-plane PRs
- again, the timing of this is uncertain
- we also are depending on being able to merge and release the right combination of data-plane and control-plane PRs at the right times in order to manage the dependency bumps involved with each PR
- ??????? days: merge backports
- there will be many backport PRs that each need to be:
- created
- this is not time consuming
- approved (in the proper order)
- it it time consuming to get reviewers, especially when I need 12-16 PRs reviewed in a certain sequence
- released
- it is time consuming to negotiate a release
- updated to pull in other releases
- this means we have to re-run CI on each updated PR (and then re-run CI again to get the flakes passing), which does delay the merge and release cadence signficantly
- so on and so on
- created
- there will be many backport PRs that each need to be:
Evaluating lambda filter's interactions with 302 response codes
I created a lambda to return a "302"
import json
def lambda_handler(event, context):
return {
'statusCode': 302,
'body': json.dumps('Hello from Lambda!')
}
- Note: the lambda actually returns the dict
{'statusCode': 302, 'body': '"Hello from Lambda!"'}
as the response body -- the HTTP response code is 200 - If users enable
unwrapAsApiGateway
, gloo will attempt to convert thestatusCode
field to an http response code, and unwrap thebody
field to set as the response body - I created a trivial gloo edge deployment (single AWS Lambda upstream, single virtual service pointing to that upstream) to route to this lambda
Test 1: standard behavior
- by default, the lambda response is passed through to the client as is
❯ curl localhost:8080 -v
* Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 200 OK
< date: Thu, 28 Mar 2024 19:32:48 GMT
< content-type: application/json
< content-length: 53
< x-amzn-requestid: 5e987bb2-bb63-4e57-9050-85586c2d7b86
< x-amzn-remapped-content-length: 0
< x-amz-executed-version: $LATEST
< x-amzn-trace-id: root=1-6605c5e0-41fc5b497c03dc4f705812ad;parent=6bfb0d15e2a92de7;sampled=0;lineage=803ac899:0
< x-envoy-upstream-service-time: 62
< server: envoy
<
* Connection #0 to host localhost left intact
{"statusCode": 302, "body": "\"Hello from Lambda!\""}%
- as you can see, by default, Gloo just passes the JSON response from the lambda to the client
- note that the
statusCode
field in the JSON response is 302, but the actual http response code is 200
Test 2: unwrapAsApiGateway
enabled
- when we enable
unwrapAsApiGateway
in the virtual service, the response code sent to the end user will be set to the value of thestatusCode
field in the lambda's response payload
❯ curl localhost:8080 -v
* Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.4.0
> Accept: */*
>
< HTTP/1.1 302 Found
< content-length: 20
< date: Thu, 28 Mar 2024 19:33:49 GMT
< server: envoy
<
* Connection #0 to host localhost left intact
"Hello from Lambda!"%
- As you can see, the response code is now 302.
- Additionally, the
body
field is now unwrapped from the JSON response, and its value is returned as the response body
I was able to reproduce with the following lambda:
import json
def lambda_handler(event, context):
return {
'multiValueHeaders': {
'foo': [
'bar',
None
]
}
}
The error occurs in our lambda filter here:
https://github.com/solo-io/envoy-gloo/blob/b0da7f382e9a311a288499e8138177209171cf21/source/extensions/transformers/aws_lambda/api_gateway_transformer.cc#L144-L146
we fail to validate the type of header_value
-- it is implicitly assumed to be string
we also fail to validate that header_values
is an object/is iterable -- potentially another bug here
logs from my replication are below
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.628][44][debug][conn_handler] [external/envoy/source/extensions/listener_managers/listener_manager/active_tcp_listener.cc:157] [Tags: "ConnectionId":"90"] new connection from 127.0.0.1:46068
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.628][44][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:379] [Tags: "ConnectionId":"90"] new stream
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.628][44][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1166] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] request headers complete (end_stream=true):
gateway-proxy-64bcf84585-f9v2d ':authority', 'localhost:8080'
gateway-proxy-64bcf84585-f9v2d ':path', '/'
gateway-proxy-64bcf84585-f9v2d ':method', 'GET'
gateway-proxy-64bcf84585-f9v2d 'user-agent', 'curl/8.4.0'
gateway-proxy-64bcf84585-f9v2d 'accept', '*/*'
gateway-proxy-64bcf84585-f9v2d
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.629][44][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1149] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] request end stream
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.629][44][debug][connection] [external/envoy/source/common/network/connection_impl.h:98] [C90] current connecting state: false
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.630][44][debug][router] [external/envoy/source/common/router/router.cc:478] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] cluster 'aws-upstream_gloo-system' match for URL '/2015-03-31/functions/ben-302-test/invocations?Qualifier=%24LATEST'
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.630][44][debug][router] [external/envoy/source/common/router/router.cc:690] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] router decoding headers:
gateway-proxy-64bcf84585-f9v2d ':authority', 'lambda.us-east-1.amazonaws.com'
gateway-proxy-64bcf84585-f9v2d ':path', '/2015-03-31/functions/ben-302-test/invocations?Qualifier=%24LATEST'
gateway-proxy-64bcf84585-f9v2d ':method', 'POST'
gateway-proxy-64bcf84585-f9v2d ':scheme', 'http'
gateway-proxy-64bcf84585-f9v2d 'user-agent', 'curl/8.4.0'
gateway-proxy-64bcf84585-f9v2d 'accept', '*/*'
gateway-proxy-64bcf84585-f9v2d 'x-forwarded-proto', 'http'
gateway-proxy-64bcf84585-f9v2d 'x-request-id', '4c0434b4-92ec-4e85-87c9-ff057c5dcba0'
gateway-proxy-64bcf84585-f9v2d 'x-amz-invocation-type', 'RequestResponse'
gateway-proxy-64bcf84585-f9v2d 'x-amz-log-type', 'None'
gateway-proxy-64bcf84585-f9v2d 'x-amz-date', '20240328T205434Z'
gateway-proxy-64bcf84585-f9v2d 'authorization', 'AWS4-HMAC-SHA256 Credential=AKIA3VU3PCIYDGH2BEG2/20240328/us-east-1/lambda/aws4_request, SignedHeaders=host;x-amz-date;x-amz-invocation-type;x-amz-log-type, Signature=7a0c83ba07462457ed1ea240d16be9a665001612fce82cc45cee34a700bcb4c1'
gateway-proxy-64bcf84585-f9v2d 'x-envoy-expected-rq-timeout-ms', '15000'
gateway-proxy-64bcf84585-f9v2d
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.631][44][debug][pool] [external/envoy/source/common/http/conn_pool_base.cc:78] queueing stream due to no available connections (ready=0 busy=0 connecting=0)
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.631][44][debug][pool] [external/envoy/source/common/conn_pool/conn_pool_base.cc:291] trying to create new connection
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.631][44][debug][pool] [external/envoy/source/common/conn_pool/conn_pool_base.cc:145] creating a new connection (connecting=0)
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.631][44][debug][multi_connection] [external/envoy/source/common/network/multi_connection_base_impl.cc:14] [C91] connections=8
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.631][44][debug][happy_eyeballs] [external/envoy/source/common/network/happy_eyeballs_connection_impl.cc:33] C[91] address=44.192.249.154:443
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.632][44][debug][connection] [external/envoy/source/common/network/connection_impl.h:98] [C92] current connecting state: true
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.632][44][debug][client] [external/envoy/source/common/http/codec_client.cc:57] [Tags: "ConnectionId":"91"] connecting
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.632][44][debug][connection] [external/envoy/source/common/network/connection_impl.cc:948] [C92] connecting to 44.192.249.154:443
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.633][44][debug][connection] [external/envoy/source/common/network/connection_impl.cc:967] [C92] connection in progress
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.666][44][debug][connection] [external/envoy/source/common/network/connection_impl.cc:695] [C92] connected
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.738][44][debug][multi_connection] [external/envoy/source/common/network/multi_connection_base_impl.cc:456] [C91] connection=1
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.738][44][debug][client] [external/envoy/source/common/http/codec_client.cc:88] [Tags: "ConnectionId":"91"] connected
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.739][44][debug][pool] [external/envoy/source/common/conn_pool/conn_pool_base.cc:328] [Tags: "ConnectionId":"91"] attaching to next stream
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.739][44][debug][pool] [external/envoy/source/common/conn_pool/conn_pool_base.cc:182] [Tags: "ConnectionId":"91"] creating stream
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.740][44][debug][router] [external/envoy/source/common/router/upstream_request.cc:571] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] pool ready
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.740][44][debug][client] [external/envoy/source/common/http/codec_client.cc:141] [Tags: "ConnectionId":"91"] encode complete
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.792][44][debug][router] [external/envoy/source/common/router/router.cc:1438] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] upstream headers complete: end_stream=false
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.793][44][debug][client] [external/envoy/source/common/http/codec_client.cc:128] [Tags: "ConnectionId":"91"] response complete
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.793][44][debug][filter] [source/extensions/transformers/aws_lambda/api_gateway_transformer.cc:56] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] Transforming response
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.794][44][debug][filter] [source/extensions/transformers/aws_lambda/api_gateway_transformer.cc:68] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] Error transforming response: [json.exception.type_error.302] type must be string, but is null
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.795][44][debug][filter] [source/extensions/transformers/aws_lambda/api_gateway_transformer.cc:36] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] Returning error with message: Failed to transform response
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.795][44][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1820] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] encoding headers via codec (end_stream=false):
gateway-proxy-64bcf84585-f9v2d ':status', '500'
gateway-proxy-64bcf84585-f9v2d 'content-type', 'text/plain'
gateway-proxy-64bcf84585-f9v2d 'x-amzn-errortype', '500'
gateway-proxy-64bcf84585-f9v2d 'content-length', '33'
gateway-proxy-64bcf84585-f9v2d 'date', 'Thu, 28 Mar 2024 20:54:34 GMT'
gateway-proxy-64bcf84585-f9v2d 'server', 'envoy'
gateway-proxy-64bcf84585-f9v2d
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.795][44][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1925] [Tags: "ConnectionId":"90","StreamId":"1154760242941528851"] Codec completed encoding stream.
gateway-proxy-64bcf84585-f9v2d [2024-03-28 20:54:34.796][44][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:53] [Tags: "ConnectionId":"91"] response complete
➤ Hanh Vu commented:
ETA depends on the amount of test coverage.
➤ Hanh Vu commented:
Moving back to backlog to ensure backports and control plane reassertion
While this has merged in our dataplane and should work once its pulled in we need to have larger integrations tests to be persisted such that we are confident in this landing and is resilient against regressions
➤ Nathan F Solo commented:
1.17.0 and 1.16.10