consul-k8s
consul-k8s copied to clipboard
API Gateway Controller in secondary datacenter has insufficient permissions
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Overview of the Issue
When deploying a federated secondary Consul datacenter via Helm chart, the API Gateway Controller deployment is configured to retrieve a token at launch time via Kubernetes auth method. This token has the local
flag set, and the associated policy is further scoped only to the secondary datacenter.
Because this token is used to create config-entry resources, which are globally created in the primary datacenter and replicated back to the secondaries, attachment of new HTTPRoute and TCPRoute resources within the secondary cluster fails to complete as the attached token is invalid in the primary DC. This prevents creation of the underlying Consul *-gateway
, service-defaults
, and service-intentions
resources managed by the API Gateway Controller.
Reproduction Steps
- Apply Kubernetes Gateway SIG and Consul API Gateway CRDs to a cluster
- Use Helm chart to deploy a Consul secondary datacenter with ACLs enabled
- Deploy a Gateway resource in the Kubernetes secondary DC. e.g.:
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
name: ns-api-gateway
namespace: consul-infra
spec:
gatewayClassName: consul-api-gateway
listeners:
- protocol: HTTP
port: 8080
name: http
allowedRoutes:
namespaces:
from: All
- Associate an
HTTPRoute
orTCPRoute
with the deployed Gateway, that references a running and connect-injected upstream service. e.g.:
apiVersion: v1
kind: ServiceAccount
metadata:
name: dashboard
namespace: webapp
---
apiVersion: v1
kind: Service
metadata:
name: dashboard
namespace: webapp
spec:
selector:
app: dashboard
ports:
- port: 9002
targetPort: 9002
name: dashboard
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: dashboard
name: dashboard
namespace: webapp
spec:
replicas: 2
selector:
matchLabels:
app: dashboard
template:
metadata:
annotations:
'consul.hashicorp.com/connect-inject': 'true'
'consul.hashicorp.com/connect-service-upstreams': 'counting:9001'
labels:
app: dashboard
spec:
serviceAccountName: dashboard
containers:
- name: dashboard
image: hashicorp/dashboard-service:0.0.4
ports:
- containerPort: 9002
env:
- name: COUNTING_SERVICE_URL
value: 'http://localhost:9001'
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
name: dashboard
namespace: webapp
spec:
protocol: http
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: HTTPRoute
metadata:
name: dashboard
namespace: webapp
spec:
parentRefs:
- name: ns-api-gateway
namespace: consul-infra
rules:
- matches:
- path:
value: /
- backendRefs:
- kind: Service
name: dashboard
namespace: webapp
port: 9002
- Observe the Consul UI/API, see no underlying
ingress-gateway
orservice-intentions
are created by the gateway controller.
Expected behavior
HTTPRoute
and TCPRoute
resources created within a Kubernetes cluster configured as a Consul secondary DC should lead to successful creation of the associated config entries within Consul.
Environment details
consul-k8s version: 0.45.0, also tested with 0.41.1 (prior to addition of component auth method) API Gateway Version: 0.3.0, also tested with 0.1.0 Kubernetes version: v1.22.8-gke.20 Cloud Provider: GCP, also tested with Azure Red Hat OpenShift
Values.yaml:
global:
name: consul
image: "hashicorp/consul-enterprise:1.12.2-ent"
imageK8S: "hashicorp/consul-k8s-control-plane:0.45.0"
datacenter: secondary
gossipEncryption:
secretName: consul-federation
secretKey: gossipEncryptionKey
tls:
enabled: true
caCert:
secretName: consul-federation
secretKey: caCert
caKey:
secretName: consul-federation
secretKey: caKey
enableConsulNamespaces: true
acls:
manageSystemACLs: true
replicationToken:
secretName: consul-federation
secretKey: replicationToken
enterpriseLicense:
secretName: consul-license
secretKey: key
federation:
enabled: true
primaryDatacenter: primary
primaryGateways: ["[...]"]
k8sAuthMethodHost: "[...]"
imageEnvoy: "envoyproxy/envoy:v1.22.2"
server:
replicas: 3
storage: 10Gi
storageClass: premium-rwo
updatePartition: 0
connectInject:
enabled: true
transparentProxy:
defaultEnabled: false
consulNamespaces:
mirroringK8S: true
controller:
enabled: true
meshGateway:
enabled: true
apiGateway:
enabled: true
image: hashicorp/consul-api-gateway:0.3.0
logLevel: debug
Thanks for reporting @krarey! This is an issue with the Helm chart that the API Gateway team will work on addressing
I'm also seeing issues that may be related in a federated secondary datacenter. I can set up routes manually and when I check the routes status, it appears to be correct.
When I try to connect to the API gateway using curl, I get an immediate closing of the connection:
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to X.X.X.X:844
In the primary datacenter the connection works correctly as expected.
It seems possible that this is a TLS issue related to the shared consul-federation caCert and caKey, but I'm not at all sure.
I know it's not exactly the same issue as above, but seems to be closely relate.
I have just fall into this issue while testing the new URLrewrite filter on latest consul-k8s helm chart (v0.47.1).
Managed to register routes in primary datacenter, but similar configs does not work in secondary datacenter.
@nathancoleman do you think that if I change one of secondary cluster to Cluster peering instead of WAN federation that would work?
@manobi I believe it's just a matter of configuring the acl-auth-method
and primary-datacenter
appropriately when deploying the API Gateway controller into a secondary datacenter. Please see the PR I just put up over at #1462 .
@nathancoleman, do you have any idea when your code is likely to be released so that we can test it?
@codex70 #1462 will be included in the next release of consul-k8s, slated for this Thursday, September 1
@codex70 @manobi version 0.48.0
is now available for the consul Helm chart and contains the code changes related to this issue (changelog)
Thank you guys, will test it right now.
@codex70 @manobi version
0.48.0
is now available for the consul Helm chart and contains the code changes related to this issue (changelog)
@nathancoleman Consul API gateway controller never becomes ready:
2022-09-02T15:51:21.019Z [ERROR] unable to login: error="Unexpected response code: 403 (rpc error making call: rpc error making call: rpc error making call: Permission denied)"
It looks to be related to serviceaccount/rolebinding stuff, since I've managed to run the following command in controller-acl-init
but not in api-gateway-controller-acl-init
:
consul-k8s-control-plane acl-init \
-component-name=api-gateway-controller \
-acl-auth-method=consul-consul-k8s-component-auth-method-REDACTED \
-primary-datacenter=REDACTED \
-consul-api-timeout=1m \
-log-level=info \
-log-json=false
I have also been able to complete the initContainer using the "consul-controller" service account instead of "consul-api-gateway-controler".
Any suggestion? maybe track in a separate issue?
Please could you keep us updated on the progress with this, it looks like it has become more complicated.
Unfortunately running a single datacenter isn't an option for us due to the flat networking requirements.
Hi @codex70 the insufficient permissions issue described here is resolved by the combination of #1462 (merged + released) and #1481 (in code review).
With the API Gateway controller running in both the primary and the secondary datacenter, there is one other issue preventing you from successfully spinning up a Gateway
in the second datacenter. That issue is over in the consul-api-gateway repo, described in https://github.com/hashicorp/consul-api-gateway/issues/361. I'm testing a fix for that this week.
Edit: We've also updated our docs describing current limitations with regard to federation here. I expect I'll have the datacenter federation feature described there working (controller per datacenter, gateways routing within the datacenter they're deployed to) with https://github.com/hashicorp/consul-k8s/pull/1481 and a fix for https://github.com/hashicorp/consul-api-gateway/issues/361; however, routing from a gateway in one datacenter to a service in a different datacenter is unlikely in the short term.
Hi all! @nathancoleman @krarey Thanks for the fix, when will it be released? It's just that the latest version of Charts: consul (0.49.0) is already with changes, but they are still not in the binary. one.
- method name
{{- if and .Values.global.federation.enabled .Values.global.federation.primaryDatacenter }}
-acl-auth-method={{ template "consul.fullname" . }}-k8s-component-auth-method-{{ .Values.global.datacenter }} \
-primary-datacenter={{ .Values.global.federation.primaryDatacenter }} \
{{-else}}
- argument -primary-datacente
{{- if and .Values.global.federation.enabled .Values.global.federation.primaryDatacenter }}
-primary-datacenter={{ .Values.global.federation.primaryDatacenter }} \
{{-end}}