apisix icon indicating copy to clipboard operation
apisix copied to clipboard

bug: requests with Istio mTLS enabled fail with connection termination

Open svilenvul opened this issue 2 years ago • 38 comments

Current Behavior

We are now using APISIX in a Kubernetes setup with Helm (https://github.com/apache/apisix-helm-chart). APISIX is running as a service in the Istio Service Mesh with Envoy sidecar applied on it.

We faced an issue where after we enabled mTLS with Istio, requests targeted to APISIX failed. During debugging we saw that the authority header for the outgoing requests from the APISIX was always set to apisix_backend. We think that his is confusing Istio during the mTLS and results in the request failure.

Expected Behavior

Requests should be successful both with Istio mTLS enabled and disabled.

Error Logs

Request Headers Info (from client grpcurl)

authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.eyJleHAiOjE2NTY2ODg4NzUsImlhdCI6MTY1NjYwMjQ3NSwiYXV0aF90aW1lIjoxNjU2NDIwNzAzLCJqdGkiOiI5ZTk1ZDdhZS0yNzlmLTRlNTktODE0Yi1mYzNkMzNmYmM4MDEiLCJpc3MiOiJodHRwczovL3RvZ2dpZC50b2dnLmNsb3VkL2F1dGgvcmVhbG1zL3RvZ2dpZCIsInN1YiI6IjgwZTkxMzYwLWVmOTAtNDI5OC05ODRkLTcxNDBiNDY5NTFlMyIsInR5cCI6IkJlYXJlciIsImF6cCI6InN1cGVyLWFwcCIsInNlc3Npb25fc3RhdGUiOiI5ZDUwNmFjMS1mNGZlLTRiOTUtYjJkNy1iYTdjOTg0ODc1MjUiLCJhY3IiOiIwIiwic2NvcGUiOiJvcGVuaWQgZW1haWwgcHJvZmlsZSIsInNpZCI6IjlkNTA2YWMxLWY0ZmUtNGI5NS1iMmQ3LWJhN2M5ODQ4NzUyNSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJuYW1lIjoiVGVzdCBUZXN0IiwicHJlZmVycmVkX3VzZXJuYW1lIjoidGVzdEB0b2dnLmNsb3VkIiwiZ2l2ZW5fbmFtZSI6IlRlc3QiLCJmYW1pbHlfbmFtZSI6IlRlc3QiLCJlbWFpbCI6InRlc3RAdG9nZy5jbG91ZCJ9.BK-EGfdyi7DoIQTRYxUFBK54f4g2IyAK6DlQDinDldjf2OXFRyWIK9OwN7Q5-hW5BO0hn0huJ4aQZ59WGUdZ5RjZqVCV3-w2ybr7BXHkwKJYnjrB0lcFy4in1WB_eiD4TMBdqb7vG6dxC8bGdm8YmBfFvJ7Ufghle33pjj67k8SJj3zUFRBK-f4umKesakfTlhlMMdALbCTxV9jIoXPtDpvDEF6V89N7LKnnoV8Q3lPBF56PGeBokdqEJLfsb5ZQcaMeW8Fi38adqZTa8A4WefoRRsOrgEhXMYoU8DrY1EWvatgms4vJKag6bygkp_2nsNKT__hoYIDBvvJMke60VQ

Response headers (from client grpcurl)

content-length: 0
content-type: application/grpc
date: Thu, 30 Jun 2022 15:39:53 GMT
server: istio-envoy
x-envoy-upstream-service-time: 84

Logs from Envoy Proxy sidecar container for APISIX

{
	"duration"1,
	"downstream_remote_address":"172.20.50.105:0",
	"upstream_service_time":null,
	"upstream_local_address":"10.234.106.162:50972",
	"response_code_details":"upstream_reset_before_response_started{connection_termination}",
	"upstream_transport_failure_reason":null,
    "route_name":"allow_any",
	"response_code":200,
	"upstream_host":"10.234.29.1:80","user_agent":"grpcurl/v1.8.1 grpc-go/1.37.0",
	"downstream_local_address":"10.234.29.1:80",
	"x_forwarded_for":"172.20.50.105",
	"connection_termination_details":null,
	"protocol":"HTTP/2",
	"upstream_cluster":"PassthroughCluster",
	"authority":"apisix_backend",
	"method":"POST",
	"start_time":"2022-06-30T15:39:53.730Z",
	"path":"/xxxx.yyyy.ms.profile.ProfileService/GetUserProfile",
	"bytes_received":51,
	"response_flags":"UC",
	"request_id":"7e4abc76-27f4-4f40-b663-056c648608b7",
	"bytes_sent":0,
	"requested_server_name":null
}

Logs from APISIX container

127.0.0.6 - - [30/Jun/2022:15:39:53 +0000] xxxxx-api-gateway.xxxx.cloud:9443 "POST /xxxx.yyyy.ms.profile.ProfileService/GetUserProfile HTTP/2.0" 200 0 0.005 "-" "grpcurl/v1.8.1 grpc-go/1.37.0" 10.234.29.1:80 200 0.004 "grpc://xxxxxx-api-gateway.xxxx.cloud:9443"

Steps to Reproduce

  1. Install Istio in k8s cluster
  2. Enable Istio Strict mTLS
$ kubectl apply -n istio-system -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: "default"
spec:
  mtls:
    mode: STRICT
EOF
  1. Create namespace for APISIX and enable auto injection for Istio
  2. Install APISIX with a Helm chart
  3. Install a gRPC Service in the same namespace
  4. Create Route and Upstream in APISIX
  • Route
{
  "uris": [
    "/xxxx.yyyyy.ms.profile.ProfileService/*"
  ],
  "name": "Profile_Service",
  "methods": [
    "POST",
    "GET"
  ],
  "upstream_id": "369132756648592144",
  "status": 1
}
  • Upstream
{
  "nodes": [
    {
      "host": "grpc-service.namespace.svc.cluster.local",
      "port": 80,
      "weight": 1
    }
  ],
  "timeout": {
    "connect": 6,
    "read": 6,
    "send": 6
  },
  "type": "roundrobin",
  "scheme": "grpc",
  "pass_host": "pass",
  "name": "Profile Service"
}

Environment

  • APISIX version: 2.12.1
  • k8s version: 1.20.7
  • Istio version: 1.10.3

svilenvul avatar Jul 04 '22 08:07 svilenvul

@tao12345666333 this issue is a continuation of https://the-asf.slack.com/archives/CUC5MN17A/p1656432528471899

svilenvul avatar Jul 04 '22 08:07 svilenvul

Strange, as per the upstream configuration you given, APISIX will use the grpc-service.namespace.svc.cluster.local as the host (authority) header.

tokers avatar Jul 04 '22 08:07 tokers

@tao12345666333 do you have an idea why this is happening? We expected that the authority will be not changed.

marziman avatar Jul 04 '22 11:07 marziman

There is a description here.

https://github.com/apache/apisix/blob/master/docs/en/latest/stream-proxy.md

But in order to solve the actual problem here, I need to know your full request chain.

Below is my understanding, please correct me if I understand wrong

Client -------> Istio IngressGateway ---> Envoy  --->   Envoy 
         TLS.                               |             |
                                            V             V
                                          APISIX        Backend

tao12345666333 avatar Jul 06 '22 04:07 tao12345666333

@tao12345666333, yes the request chain is correct.

svilenvul avatar Jul 06 '22 06:07 svilenvul

I'm working on a scenario where this behavior can be bypassed. I'll update once I have results

tao12345666333 avatar Jul 17 '22 04:07 tao12345666333

Great. If I understand, you were able to reproduce it, right?

svilenvul avatar Jul 20 '22 14:07 svilenvul

yes! I'm trying how to get around this behavior

tao12345666333 avatar Jul 21 '22 23:07 tao12345666333

Also, I'm trying to understand your current deployment architecture.

tao12345666333 avatar Jul 21 '22 23:07 tao12345666333

Hi, could you please provide me with the following information:

  • Request routing directly in APISIX container, check its request and response headers
  • Request routing from another container, check its request and response headers

thanks!

tao12345666333 avatar Jul 25 '22 01:07 tao12345666333

Hi, @tao12345666333, I will be glad to help you. Can you tell me how can I get this information for you?

svilenvul avatar Jul 25 '22 16:07 svilenvul

@svilenvul thanks!

Request routing directly in APISIX container, check its request and response headers

you can just run kubectl exec -n <YOUR NAMESPACE> deploy <APISIX's deployment> -- curl -vv <APISIX listen port>/<ROUTE path> -H "HOST: <your route host>" <and something else>

Request routing from another container, check its request and response headers

kubectl exec -n <YOUR NAMESPACE> deploy < another deployment> -- curl -vv <APISIX-gateway service name><APISIX-gateway service listen port>/<ROUTE path> -H "HOST: <your route host>" <and something else>

tao12345666333 avatar Jul 26 '22 03:07 tao12345666333

FYI, currently istio mTLS is in permissive mode (the issue occurs only in strict mode)

Request routing directly in APISIX container, check its request and response headers

kubectl exec -n xxx-id-system deploy/tiam-ms-apigateway-apisix -- curl -vv http://localhost:9080/xxx.tdp.dp.ms.legalagreements.v2.ClientService/ListLegalDocuments -H "HOST: xxx-api-gateway.xxx.cloud" --http2


> GET /xxxx.tdp.dp.ms.legalagreements.v2.ClientService/ListLegalDocuments HTTP/1.1
> Host: xxx-api-gateway.xxx.cloud
> User-Agent: curl/7.79.1
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAAQCAAAAAAIAAAAA
> 
* Received HTTP/0.9 when not allowed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Closing connection 0
curl: (1) Received HTTP/0.9 when not allowed
command terminated with exit code 1

Request routing from another container, check its request and response headers

kubectl exec -n xxx-id-system deploy/tiam-core-authn -- curl -vv http://tiam-ms-apigateway-apisix-gateway:80/xxx.tdp.dp.ms.legalagreements.v2.ClientService/ListLegalDocuments -H "HOST: xxx-api-gateway.xxx.cloud" --http2



> GET /xxx.tdp.dp.ms.legalagreements.v2.ClientService/ListLegalDocuments HTTP/1.1
> Host: xxx-api-gateway.xxx.cloud
> User-Agent: curl/7.61.1
> Accept: */*
> Connection: Upgrade, HTTP2-Settings
> Upgrade: h2c
> HTTP2-Settings: AAMAAABkAARAAAAAAAIAAAAA
> 
< HTTP/1.1 200 OK
< date: Tue, 26 Jul 2022 07:35:52 GMT
< content-type: application/grpc
< content-length: 0
< grpc-status: 7
< grpc-message: RBAC: access denied
< x-envoy-upstream-service-time: 62
< server: envoy

svilenvul avatar Jul 26 '22 07:07 svilenvul

Thanks

tao12345666333 avatar Jul 26 '22 08:07 tao12345666333

@tao12345666333 do you have any updates on this issue?

svilenvul avatar Aug 04 '22 15:08 svilenvul

Our issue might be related with https://github.com/apache/apisix/issues/7573. I think if we have control on this header, it solve our issue with the mTLS as well.

svilenvul avatar Aug 04 '22 16:08 svilenvul

@svilenvul hi, I sent you an email yesterday.

I haven't tried proxy-rewrite, so I don't know if it can be a solution. Can you try it directly in your environment?

tao12345666333 avatar Aug 04 '22 16:08 tao12345666333

I tried to edit the route and add the plugins section with the proxy-rewrite plugin, but I could manage to configure this plugin by adding:

"plugins": {
        "proxy-rewrite": {
            "host": "...."
        }
    },

When I save the route setting, the change is not applied. I tried to apply from the dashboard as well, but I don't see the proxy-rewrite plugin in the UI when I select the route, click edit and navigate to the plugins section. I was not able to verify if this plugin will solve the issue for us.

svilenvul avatar Aug 18 '22 07:08 svilenvul

I tried to apply from the dashboard as well, but I don't see the proxy-rewrite plugin in the UI when I select the route, click edit and navigate to the plugins section.

@juzhiyuan @bzp2010 Do you know what restrictions are on the dashboard?

tao12345666333 avatar Aug 18 '22 07:08 tao12345666333

In the Route -> Create page, the plugin has been implemented as UI, see:

image

juzhiyuan avatar Aug 19 '22 09:08 juzhiyuan

I tried to apply from the dashboard as well, but I don't see the proxy-rewrite plugin in the UI when I select the route, click edit and navigate to the plugins section.

@juzhiyuan @bzp2010 Do you know what restrictions are on the dashboard?

APISIX Dashboard only works well with apache/apisix for now, AFAIK. Does this mean it's recommended to use Dashboard to control Ingress?

juzhiyuan avatar Aug 19 '22 09:08 juzhiyuan

I don't think @svilenvul is using APISIX Ingress controller.

tao12345666333 avatar Aug 19 '22 18:08 tao12345666333

@tao12345666333 we are not using APISIX ingress controller. We were not able to use and see the proxy-rewrite plugin, to solve this.

Can you please help @tao12345666333 cause it is blocking us for extremly long time. And we actually would be fine if we could control the header as described at https://github.com/apache/apisix/issues/7377#issuecomment-1205451777

Please guide us what to do. Thanks

marziman avatar Aug 24 '22 03:08 marziman

@tzssangglass @tokers @spacewander Can someone please pick it up? I guess this requires some APISIX related modifications.

Or maybe there is something I didn't notice

tao12345666333 avatar Aug 24 '22 09:08 tao12345666333

We would be really thankful! Everyone using APISIX in k8s with Istio will face this issue, and I (and you alll) hope many people will do so in a ServiceMesh constellation. The issue is exactly similiar to this one https://github.com/apache/apisix/issues/7573. We can not modify the authority and this causes Istio to completely reject.

Many thanks for all your hard work and help. BR Mehmet

marziman avatar Aug 24 '22 09:08 marziman

Hi @marziman, I have emailed you and Mattiullah but without a reply.

This issue really takes a long time, and to better help your business resolve those issues, could you please pick one slot at your convenience from https://meetings.hubspot.com/zhiyuan? I will invite apisix's maintainers to attend. 😉

juzhiyuan avatar Aug 26 '22 05:08 juzhiyuan

Hello @juzhiyuan

We gave all the inputs to test this. There is one issue https://github.com/apache/apisix/issues/7573 which is what we are facing.

Can someone of the core maintainers say something, as this is authority header is breaking things in a scenario of Istio & Apisix.

All is described in the issue, a meeting would not bring anything else to the table.

@tao12345666333 @tokers @spacewander I think you forgot about this topic. Could you pleas engage 🙏🏻

BR Mehmet

marziman avatar Sep 19 '22 03:09 marziman

@marziman Hi, sure, let me check with teammates.

juzhiyuan avatar Sep 19 '22 05:09 juzhiyuan

@marziman I just submitted https://github.com/apache/apisix/pull/7939/files for it. Does adding grpc_set_header "Host" $upstream_host; ahead of grpc_set_header Content-Type application/grpc; in apisix/cli/ngx_tpl.lua solve your problem?

spacewander avatar Sep 19 '22 12:09 spacewander

@spacewander many thanks! Is there a way that we can get an unofficial APISIX docker image version with this changes applied, so we can deploy that with our APISIX Helm charts (we are using your official Helm charts). So we can fastly test this and report back to you?

marziman avatar Sep 19 '22 13:09 marziman