cloud-on-k8s
cloud-on-k8s copied to clipboard
Consul Connect and ECK
Consul Connect seems to operate in a different way from other service meshes as it requires applications to opt-in to the mesh. As far as I can tell, there are no built-in provisions for transparent proxying of all traffic like Istio or Linkerd. The implementation patterns appear to be the following:
- Use Consul as the cluster DNS resolver: Existing Kubernetes services can be added to Consul automatically via catalog sync but as far as I can tell, they still need to be accessed with the special suffix
service.consulinstead of thecluster.localsuffix. [needs more research to confirm] - Use the Connect sidecar: This requires the applications to declare the services they need access to by using the
consul.hashicorp.com/connect-service-upstreamsannotation. Each upstream service must have a unique port that the application can then access by sending the requests to127.0.0.1:<port>. Services not declared in the annotation cannot be accessed through the Connect proxy. Dynamic upstreams require native integration with Connect. - Native integration: Connect Go library can be used instead of the proxy sidecar to access services. This obviously requires applications to be built specifically with the knowledge that they are being deployed to a Consul Connect environment.
The above integration methods make it difficult for the ECK operator to be fully integrated into the mesh like it does with Istio and Linkerd. The operator needs to access the Elastic Stack applications that it creates -- which are not known beforehand. Declaring the upstream services using the consul.hashicorp.com/connect-service-upstreams annotation is therefore not possible. Furthermore, the operator accesses Stack applications using the Kubernetes DNS names of the respective services. So far, it seems that in order to go through the mesh, the operator needs to use the Consul service DNS name (or in the case of the proxy, 127.0.0.1:<port> -- which becomes even more complicated if there is more than one deployment of an application as each one will need to be manually configured to use a unique port number instead of the application default).
Users who wish to have full enmeshment (all application communications going through the mesh) will have a hard time configuring ECK to do so. Even if they know the exact set of Stack applications they are going to deploy and can list them using the above-mentioned annotation, the operator itself needs to be changed to use the special DNS name or the port number to access the Stack applications through the mesh.
If full enmeshment is not required, it is fairly easy to have the Elastic Stack applications be accessible through the mesh. One caveat is with associations. If users want Kibana to Elasticsearch or APM Server to Elasticsearch associations to go through the mesh, they cannot use the elasticsearchRef field in the manifest and should configure the connection manually instead.
Connecting Stack applications to the mesh with ACL enforcement
---
# Service account for the Elasticsearch service (for ACL enforcement)
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-consul-es
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elastic-consul-es
spec:
version: 7.6.2
http:
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- name: default
count: 3
config:
node.store.allow_mmap: false
podTemplate:
metadata:
annotations:
consul.hashicorp.com/connect-service: "elastic-consul-es"
consul.hashicorp.com/connect-inject: "true"
consul.hashicorp.com/connect-service-port: "http"
spec:
automountServiceAccountToken: true
serviceAccount: elastic-consul-es
---
# Service account for the Kibana service (for ACL enforcement)
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-consul-kb
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: elastic-consul-kb
spec:
version: 7.6.2
count: 1
http:
tls:
selfSignedCertificate:
disabled: true
elasticsearchRef:
# This connection does not go through the mesh
name: elastic-consul-es
podTemplate:
metadata:
annotations:
consul.hashicorp.com/connect-service: "elastic-consul-kb"
consul.hashicorp.com/connect-inject: "true"
consul.hashicorp.com/connect-service-port: "http"
spec:
automountServiceAccountToken: true
serviceAccount: elastic-consul-kb
With the above configuration deployed, Elasticsearch will be available in the Consul catalog as elastic-consul-es and Kibana as elastic-consul-kb.
$ consul catalog services -tags
consul-consul-connect-injector-svc-consul k8s
...
elastic-consul-es
elastic-consul-es-es-default-default k8s
elastic-consul-es-es-http-default k8s
elastic-consul-es-es-transport-default k8s
elastic-consul-es-sidecar-proxy
elastic-consul-kb
elastic-consul-kb-kb-http-default k8s
elastic-consul-kb-sidecar-proxy
elastic-webhook-server-elastic-system k8s
...
To test access, deploy a client pod:
---
# Service account for ACL enforcement
apiVersion: v1
kind: ServiceAccount
metadata:
name: static-client
---
apiVersion: v1
kind: Pod
metadata:
name: static-client
annotations:
"consul.hashicorp.com/connect-inject": "true"
"consul.hashicorp.com/connect-service-upstreams": "elastic-consul-es:9200,elastic-consul-kb:5601"
spec:
containers:
- name: static-client
image: tutum/curl:latest
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
serviceAccountName: static-client
If strict enforcement is in place, the client will be denied access by default:
$ kubectl exec -t -i static-client -- curl -v -k -XGET -u "elastic:$(kubectl get secret elastic-consul-es-es-elastic-user -o=go-template='{{ .data.elastic | base64decode}}')" 'http://127.0.0.1:9200/_cat/health?v'
curl: (52) Empty reply from server
Add an intention to allow access:
$ consul intention check static-client elastic-consul-es
Denied
$ consul intention create static-client elastic-consul-es
Created: static-client => elastic-consul-es (allow)
$ k exec -t -i static-client -- curl -v -k -XGET -u "elastic:$(kubectl get secret elastic-consul-es-es-elastic-user -o=go-template='{{ .data.elastic | base64decode}}')" 'http://127.0.0.1:9200/_cat/health?v'
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1588066323 09:32:03 elastic-consul-es green 3 3 10 5 0 0 0 0 - 100.0%
Associating through the mesh
Automatic association will not go through the mesh as the operator uses the Kubernetes service name to create the association. Instead, the association has be manually created.
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-consul-es
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-consul-kb
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elastic-consul-es
spec:
version: 7.6.2
http:
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- name: default
count: 3
config:
node.store.allow_mmap: false
podTemplate:
metadata:
annotations:
consul.hashicorp.com/connect-service: "elastic-consul-es"
consul.hashicorp.com/connect-inject: "true"
consul.hashicorp.com/connect-service-port: "http"
spec:
automountServiceAccountToken: true
serviceAccount: elastic-consul-es
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: elastic-consul-kb
spec:
version: 7.6.2
count: 1
http:
tls:
selfSignedCertificate:
disabled: true
config:
elasticsearch.hosts:
- http://127.0.0.1:9200
elasticsearch.username: elastic
elasticsearch.ssl.verificationMode: none
podTemplate:
metadata:
annotations:
consul.hashicorp.com/connect-service: "elastic-consul-kb"
consul.hashicorp.com/connect-inject: "true"
consul.hashicorp.com/connect-service-port: "http"
consul.hashicorp.com/connect-service-upstreams: "elastic-consul-es:9200"
spec:
automountServiceAccountToken: true
serviceAccount: elastic-consul-kb
containers:
- name: kibana
env:
- name: ELASTICSEARCH_PASSWORD
valueFrom:
secretKeyRef:
name: elastic-consul-es-es-elastic-user
key: elastic
If strict enforcement is on, enable access from Kibana to Elasticsearch as well.
$ consul intention create elastic-consul-kb elastic-consul-es
So I've been playing with this for the last 2 weeks, using the association through the mesh pattern, and what I've been noticing is that the Kibana pod will drop it's Consul Connect containers, and thus lose connection to the Elastic cluster. Deleting the pod and letting it rebuild normally fixes this issue
I'm not sure if anyone else has experienced this as well but is something that would need to get looked at.
Please note that the information in this issue is a record of exploratory work we have done and should not be considered to be the canonical way of integrating ECK with Consul Connect.
That being said, I am curious to know what you mean by "Kibana pod will drop it's Consul Connect containers". Do the sidecars simply disappear? How often does that happen and does it happen to other applications as well?
I don't think it is due to an interaction problem with ECK. As long as the consul.hashicorp.com/connect-inject: "true" annotation is present in the pod, it is the responsibility of Consul Connect controller to inject the sidecar. Sounds like that's not happening for some reason. It might be worth looking at Connect logs to see if there's anything interesting in there to explain that.
Yeah totally, I'm just relaying the experiences we've seen running at reasonable capacity this for the last 2 weeks.
When I say drop, yeah thats exactly what happens. The containers are deleted from the pods, and as such there are no logs for me to check what happens (we're working on this still)
Applied manifest that we've had a good experience with (bar this 1 niggle):
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: strix
spec:
version: 7.6.2
updateStrategy:
changeBudget:
maxSurge: 2
maxUnavailable: 1
http:
tls:
selfSignedCertificate:
disabled: true
service:
spec:
type: NodePort
nodeSets:
- name: elasticsearch
count: 3
config:
node.master: true
node.data: true
node.ingest: true
volumeClaimTemplates:
- metadata:
name: strix-es-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: standard
env:
- name: ES_JAVA_OPTS
value: "-Xms4g -Xmx4g"
resources:
requests:
memory: 4Gi
cpu: 0.5
limits:
memory: 4Gi
cpu: 2
podTemplate:
metadata:
annotations:
consul.hashicorp.com/connect-inject: "true"
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: strix-kibana
spec:
version: 7.6.2
count: 1
http:
tls:
selfSignedCertificate:
disabled: true
config:
elasticsearch.hosts:
- http://127.0.0.1:9200
elasticsearch.username: elastic
elasticsearch.ssl.verificationMode: none
podTemplate:
metadata:
annotations:
consul.hashicorp.com/connect-service: "strix-kibana"
consul.hashicorp.com/connect-inject: "true"
consul.hashicorp.com/connect-service-upstreams: "elasticsearch:9200"
spec:
containers:
- name: kibana
env:
- name: ELASTICSEARCH_PASSWORD
valueFrom:
secretKeyRef:
name: strix-es-elastic-user
key: elastic
You should look at the logs of consul-connect-injector-webhook-deployment in the namespace where Consul is installed (the name could be slightly different depending on how you installed Consul). That might provide some clues about why the sidecar is missing. One possible cause could be that your manifest is missing automountServiceAccountToken: true from the pod template. You may even need to create a service account to match the Connect service name and use that instead of the default (similar to the example in the top comment).
On a side note, you should consider providing your own Connect service name using the consul.hashicorp.com/connect-service annotation. This is because Consul Connect uses the name of the container as the default service name. If you deploy a second Elasticsearch cluster using ECK, it would be called elasticsearch as well if that annotation is missing.
The injector webhook deployment dont have anything revealing in them yet, but the log level is currently just INFO; running it for a couple of weeks with it set to DEBUG.
We run GKE with automountServiceAccountToken set to True by default, so I know the issue isn't there (we've debugged that before)
And yeah totally agree I should be naming them up properly, this is just the PoC I deployed to test out. Will get around to changing it aat some point
Pods created by ECK resources have automountServiceAccountToken set to false unless the manifest explicitly overrides it in the podTemplate section.
And I'm assuming that takes precedence then? Cool I'll add that in
Okay so I've attempted to make a couple of modifications to the cluster for the first time since I've deployed it (which has been working fine up and until now).
One of the Elastic nodes was reporting that the disk was full, so I attempted to increase the storage by following the advice here
Its been a couple of hours now and is still in the ApplyingChanges stage. I'm unsure if this is because of the mesh, or some other underlying mechanism
@charith-elastic @Ares3266 Consul 1.10 (currently in beta; GA later this month) adds support for transparent proxying on Kubernetes. This should simplify the process of using Consul service mesh with ECK.
Has anything changed since this was originally written? I'm trying to use Consul with Elastic deployed using Operator in Kubernetes, and I'm having a lot of difficulty getting it to work.
If I start without Consul, everything works as expected, however with Consul I get an empty response from the the endpoint however I try to access the service. If I go directly into the elastic search pod "elastic-search-es-default-0" and curl the endpoint using localhost, or the pod's IP address, it works. If I try to curl the service's endpoint, or IP address it doesn't work.
If I go directly into the operator pod "elastic-operator-0", nothing works, including the search pod's IP.
This is the current setup:
NAME READY STATUS RESTARTS AGE
pod/elastic-operator-0 2/2 Running 2 106m
pod/elastic-search-es-default-0 2/2 Running 0 84m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/elastic-operator-webhook ClusterIP 10.14.48.161 <none> 443/TCP 106m
service/elastic-search-es-default ClusterIP None <none> 9200/TCP 84m
service/elastic-search-es-http ClusterIP 10.14.54.15 <none> 9200:30004/TCP 84m
service/elastic-search-es-internal-http ClusterIP 10.14.89.29 <none> 9200/TCP 84m
service/elastic-search-es-transport ClusterIP None <none> 9300/TCP 84m
service/kibana-kb-http ClusterIP 10.14.120.175 <none> 5601/TCP 84m
So from inside the search pod:
curl -k -u elastic:PASSWORD http://10.13.1.27:9200/
{
"name" : "elastic-search-es-default-0",
"cluster_name" : "elastic-search",
"cluster_uuid" : "VnxncScAQ76Dn-yNCqqkVA",
"version" : {
"number" : "8.1.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "3700f7679f7d95e36da0b43762189bab189bc53a",
"build_date" : "2022-03-03T14:20:00.690422633Z",
"build_snapshot" : false,
"lucene_version" : "9.0.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
curl -k -u elastic:PASSWORD http://10.14.54.15:9200/
curl: (52) Empty reply from server
All other request from anywhere else respond with curl: (52) Empty reply from server
I'm using as close to a vanilla setup as I can both for consul and elastic whilst I get everything to work together nicely, and everything is installed using helm templates.
Consul config:
consul:
global:
name: consul
datacenter: dc1
image: hashicorp/consul:1.11.2
imageEnvoy: envoyproxy/envoy:v1.20.1
imageK8S: hashicorp/consul-k8s-control-plane:0.39.0
metrics:
enabled: true
enableAgentMetrics: true
server:
replicas: 1
ui:
enabled: true
connectInject:
enabled: true
default: true
controller:
enabled: true
prometheus:
enabled: false
Elastic:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elastic-search
spec:
version: {{ .Values.elastic.version }}
http:
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- name: default
count: 1
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: oci-bv
config:
node.store.allow_mmap: false
podTemplate:
metadata:
annotations:
consul.hashicorp.com/connect-service: "elastic-search-es"
I'm sure it's something really obvious that I'm missing, but I've spent two days on this so far and I'm getting nowhere.
@codex70 I'll take a proper look at this if I get 5 minutes but last time I ran into something similar its because Envoy is attempting to hit an upstream URL thats broken (check ports, FQDN etc)
Thanks, it would be great if you could take a look. As I said, Elasticsearch works without Consul. From what you said I'm not sure if there might be something like firewall rules that are causing an issue (this is running in a cloud environment). Also, I wasn't sure whether it was missing something like certificates or any other security features that are required for Consul as this is a bare minimum setup before I implement any real security.
Also, just to confirm, I have setup the "Connect Service Mesh on Kubernetes" as described in https://www.consul.io/docs/k8s/connect#accepting-inbound-connections and this appears to work correctly.
In the Consul service UI, all services including kibana and elastic search show in green.
Out of interest, do you have any examples of setting up something like beats to send information to elastic? I can connect with Kibana, but not with beats, and I also get a message saying Elastic was unable to get the license information (all seem to be related to the above)