cloud-on-k8s icon indicating copy to clipboard operation
cloud-on-k8s copied to clipboard

Consul Connect and ECK

Open charith-elastic opened this issue 5 years ago • 14 comments

Consul Connect seems to operate in a different way from other service meshes as it requires applications to opt-in to the mesh. As far as I can tell, there are no built-in provisions for transparent proxying of all traffic like Istio or Linkerd. The implementation patterns appear to be the following:

  • Use Consul as the cluster DNS resolver: Existing Kubernetes services can be added to Consul automatically via catalog sync but as far as I can tell, they still need to be accessed with the special suffix service.consul instead of the cluster.local suffix. [needs more research to confirm]
  • Use the Connect sidecar: This requires the applications to declare the services they need access to by using the consul.hashicorp.com/connect-service-upstreams annotation. Each upstream service must have a unique port that the application can then access by sending the requests to 127.0.0.1:<port>. Services not declared in the annotation cannot be accessed through the Connect proxy. Dynamic upstreams require native integration with Connect.
  • Native integration: Connect Go library can be used instead of the proxy sidecar to access services. This obviously requires applications to be built specifically with the knowledge that they are being deployed to a Consul Connect environment.

The above integration methods make it difficult for the ECK operator to be fully integrated into the mesh like it does with Istio and Linkerd. The operator needs to access the Elastic Stack applications that it creates -- which are not known beforehand. Declaring the upstream services using the consul.hashicorp.com/connect-service-upstreams annotation is therefore not possible. Furthermore, the operator accesses Stack applications using the Kubernetes DNS names of the respective services. So far, it seems that in order to go through the mesh, the operator needs to use the Consul service DNS name (or in the case of the proxy, 127.0.0.1:<port> -- which becomes even more complicated if there is more than one deployment of an application as each one will need to be manually configured to use a unique port number instead of the application default).

Users who wish to have full enmeshment (all application communications going through the mesh) will have a hard time configuring ECK to do so. Even if they know the exact set of Stack applications they are going to deploy and can list them using the above-mentioned annotation, the operator itself needs to be changed to use the special DNS name or the port number to access the Stack applications through the mesh.

If full enmeshment is not required, it is fairly easy to have the Elastic Stack applications be accessible through the mesh. One caveat is with associations. If users want Kibana to Elasticsearch or APM Server to Elasticsearch associations to go through the mesh, they cannot use the elasticsearchRef field in the manifest and should configure the connection manually instead.

Connecting Stack applications to the mesh with ACL enforcement

---
# Service account for the Elasticsearch service (for ACL enforcement)
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-consul-es

---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elastic-consul-es
spec:
  version: 7.6.2
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false
    podTemplate:
      metadata:
        annotations:
          consul.hashicorp.com/connect-service: "elastic-consul-es"
          consul.hashicorp.com/connect-inject: "true"
          consul.hashicorp.com/connect-service-port: "http"
      spec:
        automountServiceAccountToken: true
        serviceAccount: elastic-consul-es
---
# Service account for the Kibana service (for ACL enforcement)
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-consul-kb
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: elastic-consul-kb
spec:
  version: 7.6.2
  count: 1
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  elasticsearchRef:
    # This connection does not go through the mesh
    name: elastic-consul-es
  podTemplate:
    metadata:
      annotations:
        consul.hashicorp.com/connect-service: "elastic-consul-kb"
        consul.hashicorp.com/connect-inject: "true"
        consul.hashicorp.com/connect-service-port: "http"
    spec:
      automountServiceAccountToken: true
      serviceAccount: elastic-consul-kb

With the above configuration deployed, Elasticsearch will be available in the Consul catalog as elastic-consul-es and Kibana as elastic-consul-kb.

$ consul catalog services -tags
consul-consul-connect-injector-svc-consul      k8s
...
elastic-consul-es                              
elastic-consul-es-es-default-default           k8s
elastic-consul-es-es-http-default              k8s
elastic-consul-es-es-transport-default         k8s
elastic-consul-es-sidecar-proxy                
elastic-consul-kb                              
elastic-consul-kb-kb-http-default              k8s
elastic-consul-kb-sidecar-proxy                
elastic-webhook-server-elastic-system          k8s
...

To test access, deploy a client pod:

---
# Service account for ACL enforcement
apiVersion: v1
kind: ServiceAccount
metadata:
  name: static-client
---
apiVersion: v1
kind: Pod
metadata:
  name: static-client
  annotations:
    "consul.hashicorp.com/connect-inject": "true"
    "consul.hashicorp.com/connect-service-upstreams": "elastic-consul-es:9200,elastic-consul-kb:5601"
spec:
  containers:
    - name: static-client
      image: tutum/curl:latest
      command: [ "/bin/sh", "-c", "--" ]
      args: [ "while true; do sleep 30; done;" ]
  serviceAccountName: static-client

If strict enforcement is in place, the client will be denied access by default:

$ kubectl exec -t -i static-client -- curl -v -k -XGET -u "elastic:$(kubectl get secret elastic-consul-es-es-elastic-user -o=go-template='{{ .data.elastic | base64decode}}')" 'http://127.0.0.1:9200/_cat/health?v'
curl: (52) Empty reply from server

Add an intention to allow access:

$ consul intention check static-client elastic-consul-es 
Denied

$ consul intention create static-client elastic-consul-es 
Created: static-client => elastic-consul-es (allow)

$ k exec -t -i static-client -- curl -v -k -XGET -u "elastic:$(kubectl get secret elastic-consul-es-es-elastic-user -o=go-template='{{ .data.elastic | base64decode}}')" 'http://127.0.0.1:9200/_cat/health?v'
epoch      timestamp cluster           status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1588066323 09:32:03  elastic-consul-es green           3         3     10   5    0    0        0             0                  -                100.0%

Associating through the mesh

Automatic association will not go through the mesh as the operator uses the Kubernetes service name to create the association. Instead, the association has be manually created.

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-consul-es
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-consul-kb
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elastic-consul-es
spec:
  version: 7.6.2
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false
    podTemplate:
      metadata:
        annotations:
          consul.hashicorp.com/connect-service: "elastic-consul-es"
          consul.hashicorp.com/connect-inject: "true"
          consul.hashicorp.com/connect-service-port: "http"
      spec:
        automountServiceAccountToken: true
        serviceAccount: elastic-consul-es
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: elastic-consul-kb
spec:
  version: 7.6.2
  count: 1
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  config:
    elasticsearch.hosts:
      - http://127.0.0.1:9200
    elasticsearch.username: elastic
    elasticsearch.ssl.verificationMode: none
  podTemplate:
    metadata:
      annotations:
        consul.hashicorp.com/connect-service: "elastic-consul-kb"
        consul.hashicorp.com/connect-inject: "true"
        consul.hashicorp.com/connect-service-port: "http"
        consul.hashicorp.com/connect-service-upstreams: "elastic-consul-es:9200"
    spec:
      automountServiceAccountToken: true
      serviceAccount: elastic-consul-kb
      containers:
        - name: kibana
          env:
            - name: ELASTICSEARCH_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: elastic-consul-es-es-elastic-user
                  key: elastic

If strict enforcement is on, enable access from Kibana to Elasticsearch as well.

$ consul intention create elastic-consul-kb elastic-consul-es

charith-elastic avatar Apr 28 '20 09:04 charith-elastic

So I've been playing with this for the last 2 weeks, using the association through the mesh pattern, and what I've been noticing is that the Kibana pod will drop it's Consul Connect containers, and thus lose connection to the Elastic cluster. Deleting the pod and letting it rebuild normally fixes this issue

I'm not sure if anyone else has experienced this as well but is something that would need to get looked at.

ghost avatar May 11 '20 12:05 ghost

Please note that the information in this issue is a record of exploratory work we have done and should not be considered to be the canonical way of integrating ECK with Consul Connect.

That being said, I am curious to know what you mean by "Kibana pod will drop it's Consul Connect containers". Do the sidecars simply disappear? How often does that happen and does it happen to other applications as well?

I don't think it is due to an interaction problem with ECK. As long as the consul.hashicorp.com/connect-inject: "true" annotation is present in the pod, it is the responsibility of Consul Connect controller to inject the sidecar. Sounds like that's not happening for some reason. It might be worth looking at Connect logs to see if there's anything interesting in there to explain that.

charith-elastic avatar May 11 '20 13:05 charith-elastic

Yeah totally, I'm just relaying the experiences we've seen running at reasonable capacity this for the last 2 weeks.

When I say drop, yeah thats exactly what happens. The containers are deleted from the pods, and as such there are no logs for me to check what happens (we're working on this still)

Applied manifest that we've had a good experience with (bar this 1 niggle):

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: strix
spec:
  version: 7.6.2
  updateStrategy:
    changeBudget:
      maxSurge: 2
      maxUnavailable: 1
  http:
    tls:
      selfSignedCertificate:
        disabled: true
      service:
        spec:
          type: NodePort
  nodeSets:
  - name: elasticsearch
    count: 3
    config:
      node.master: true
      node.data: true
      node.ingest: true
    volumeClaimTemplates:
      - metadata:
          name: strix-es-data
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 50Gi
          storageClassName: standard
    env:
      - name: ES_JAVA_OPTS
        value: "-Xms4g -Xmx4g"
    resources:
      requests:
        memory: 4Gi
        cpu: 0.5
      limits:
        memory: 4Gi
        cpu: 2
    podTemplate:
      metadata:
        annotations:
          consul.hashicorp.com/connect-inject: "true"
      spec:
        initContainers:
          - name: sysctl
            securityContext:
              privileged: true
            command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: strix-kibana
spec:
  version: 7.6.2
  count: 1
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  config:
    elasticsearch.hosts:
      - http://127.0.0.1:9200
    elasticsearch.username: elastic
    elasticsearch.ssl.verificationMode: none
  podTemplate:
    metadata:
      annotations:
        consul.hashicorp.com/connect-service: "strix-kibana"
        consul.hashicorp.com/connect-inject: "true"
        consul.hashicorp.com/connect-service-upstreams: "elasticsearch:9200"
    spec:
      containers:
        - name: kibana
          env:
            - name: ELASTICSEARCH_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: strix-es-elastic-user
                  key: elastic

ghost avatar May 11 '20 13:05 ghost

You should look at the logs of consul-connect-injector-webhook-deployment in the namespace where Consul is installed (the name could be slightly different depending on how you installed Consul). That might provide some clues about why the sidecar is missing. One possible cause could be that your manifest is missing automountServiceAccountToken: true from the pod template. You may even need to create a service account to match the Connect service name and use that instead of the default (similar to the example in the top comment).

On a side note, you should consider providing your own Connect service name using the consul.hashicorp.com/connect-service annotation. This is because Consul Connect uses the name of the container as the default service name. If you deploy a second Elasticsearch cluster using ECK, it would be called elasticsearch as well if that annotation is missing.

charith-elastic avatar May 11 '20 14:05 charith-elastic

The injector webhook deployment dont have anything revealing in them yet, but the log level is currently just INFO; running it for a couple of weeks with it set to DEBUG.

We run GKE with automountServiceAccountToken set to True by default, so I know the issue isn't there (we've debugged that before)

And yeah totally agree I should be naming them up properly, this is just the PoC I deployed to test out. Will get around to changing it aat some point

ghost avatar May 11 '20 14:05 ghost

Pods created by ECK resources have automountServiceAccountToken set to false unless the manifest explicitly overrides it in the podTemplate section.

charith-elastic avatar May 11 '20 14:05 charith-elastic

And I'm assuming that takes precedence then? Cool I'll add that in

ghost avatar May 11 '20 14:05 ghost

Okay so I've attempted to make a couple of modifications to the cluster for the first time since I've deployed it (which has been working fine up and until now).

One of the Elastic nodes was reporting that the disk was full, so I attempted to increase the storage by following the advice here

Its been a couple of hours now and is still in the ApplyingChanges stage. I'm unsure if this is because of the mesh, or some other underlying mechanism

ghost avatar Jun 23 '20 22:06 ghost

@charith-elastic @Ares3266 Consul 1.10 (currently in beta; GA later this month) adds support for transparent proxying on Kubernetes. This should simplify the process of using Consul service mesh with ECK.

blake avatar Jun 09 '21 22:06 blake

Has anything changed since this was originally written? I'm trying to use Consul with Elastic deployed using Operator in Kubernetes, and I'm having a lot of difficulty getting it to work.

If I start without Consul, everything works as expected, however with Consul I get an empty response from the the endpoint however I try to access the service. If I go directly into the elastic search pod "elastic-search-es-default-0" and curl the endpoint using localhost, or the pod's IP address, it works. If I try to curl the service's endpoint, or IP address it doesn't work.

If I go directly into the operator pod "elastic-operator-0", nothing works, including the search pod's IP.

This is the current setup:

NAME                               READY   STATUS    RESTARTS   AGE
pod/elastic-operator-0             2/2     Running   2          106m
pod/elastic-search-es-default-0    2/2     Running   0          84m

NAME                                      TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)          AGE
service/elastic-operator-webhook          ClusterIP      10.14.48.161    <none>            443/TCP          106m
service/elastic-search-es-default         ClusterIP      None            <none>            9200/TCP         84m
service/elastic-search-es-http            ClusterIP      10.14.54.15     <none>   	   9200:30004/TCP   84m
service/elastic-search-es-internal-http   ClusterIP      10.14.89.29     <none>            9200/TCP         84m
service/elastic-search-es-transport       ClusterIP      None            <none>            9300/TCP         84m
service/kibana-kb-http                    ClusterIP      10.14.120.175   <none>            5601/TCP         84m

So from inside the search pod:

curl -k -u elastic:PASSWORD http://10.13.1.27:9200/
{
  "name" : "elastic-search-es-default-0",
  "cluster_name" : "elastic-search",
  "cluster_uuid" : "VnxncScAQ76Dn-yNCqqkVA",
  "version" : {
    "number" : "8.1.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "3700f7679f7d95e36da0b43762189bab189bc53a",
    "build_date" : "2022-03-03T14:20:00.690422633Z",
    "build_snapshot" : false,
    "lucene_version" : "9.0.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

curl -k -u elastic:PASSWORD http://10.14.54.15:9200/
curl: (52) Empty reply from server

All other request from anywhere else respond with curl: (52) Empty reply from server

I'm using as close to a vanilla setup as I can both for consul and elastic whilst I get everything to work together nicely, and everything is installed using helm templates.

Consul config:

consul:
  global:
    name: consul
    datacenter: dc1
    image: hashicorp/consul:1.11.2
    imageEnvoy: envoyproxy/envoy:v1.20.1
    imageK8S: hashicorp/consul-k8s-control-plane:0.39.0
    metrics:
      enabled: true
      enableAgentMetrics: true
  server:
    replicas: 1
  ui:
    enabled: true
  connectInject:
    enabled: true
    default: true
  controller:
    enabled: true
  prometheus:
    enabled: false

Elastic:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elastic-search
spec:
  version: {{ .Values.elastic.version }}
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
    - name: default
      count: 1
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
            storageClassName: oci-bv
      config:
        node.store.allow_mmap: false
      podTemplate:
        metadata:
          annotations:
            consul.hashicorp.com/connect-service: "elastic-search-es"

I'm sure it's something really obvious that I'm missing, but I've spent two days on this so far and I'm getting nowhere.

codex70 avatar Mar 21 '22 14:03 codex70

@codex70 I'll take a proper look at this if I get 5 minutes but last time I ran into something similar its because Envoy is attempting to hit an upstream URL thats broken (check ports, FQDN etc)

ghost avatar Mar 21 '22 18:03 ghost

Thanks, it would be great if you could take a look. As I said, Elasticsearch works without Consul. From what you said I'm not sure if there might be something like firewall rules that are causing an issue (this is running in a cloud environment). Also, I wasn't sure whether it was missing something like certificates or any other security features that are required for Consul as this is a bare minimum setup before I implement any real security.

codex70 avatar Mar 21 '22 19:03 codex70

Also, just to confirm, I have setup the "Connect Service Mesh on Kubernetes" as described in https://www.consul.io/docs/k8s/connect#accepting-inbound-connections and this appears to work correctly.

In the Consul service UI, all services including kibana and elastic search show in green.

codex70 avatar Mar 22 '22 10:03 codex70

Out of interest, do you have any examples of setting up something like beats to send information to elastic? I can connect with Kibana, but not with beats, and I also get a message saying Elastic was unable to get the license information (all seem to be related to the above)

codex70 avatar Mar 25 '22 14:03 codex70