Data Transport Cert Secret Size Overrun With Big Scale Out
Bug Report
What did you do?
- Attempted to scale the data replicas to 250.
What did you expect to see?
- successful scale up
What did you see instead? Under which circumstances?
- It appears that the ECK operator will overflow the max k8s secret size (1MB) for the transport certs if you scale the data nodes to >250.
- The operator gets stuck in a scale up loop while it tries to reconcile the cert secret. Even after scaling down the operator does not seem to recover.
"Secret "elasticsearch-XXX-es-data-es-transport-certs" is invalid: data: Too long: must have at most 1048576 bytes" error
Failed remediations
Environment
-
ECK version: 2.8.0
-
Kubernetes information:
- Cloud: GKE v1.26.3-gke.1000
-
kubectl version: v1.27.2
-
Resource definition:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch-myapp
spec:
version: 8.6.1
http:
tls:
selfSignedCertificate:
disabled: true
nodeSets:
- config:
action:
auto_create_index: false
node.roles:
- master
count: 3
name: election
podTemplate:
metadata:
annotations:
linkerd.io/inject: enabled
labels:
ec.ai/component: elasticsearch
ec.ai/component_group: myapp-service
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-spot
operator: DoesNotExist
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: elasticsearch.k8s.elastic.co/cluster-name
operator: In
values:
- elasticsearch-myapp
topologyKey: topology.kubernetes.io/zone
weight: 100
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: elasticsearch.k8s.elastic.co/cluster-name
operator: In
values:
- elasticsearch-myapp
topologyKey: kubernetes.io/hostname
automountServiceAccountToken: true
containers:
- name: elasticsearch
resources:
limits:
cpu: "2"
memory: 5Gi
requests:
cpu: "1"
memory: 5Gi
initContainers:
- command:
- sh
- -c
- sysctl -w vm.max_map_count=262144
image: busybox:1.28
name: sysctl
securityContext:
privileged: true
- command:
- sh
- -c
- bin/elasticsearch-plugin install --batch analysis-icu
name: analysis-icu
- command:
- sh
- -c
- bin/elasticsearch-plugin install --batch repository-gcs
name: repository-gcs
priorityClassName: app-critical-preempting
serviceAccount: myapp-elasticsearch
serviceAccountName: myapp-elasticsearch
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
storageClassName: standard-rwo
- config:
action:
auto_create_index: false
node.roles:
- data
count: 200
name: data
podTemplate:
metadata:
annotations:
linkerd.io/inject: enabled
labels:
ec.ai/component: elasticsearch
ec.ai/component_group: myapp-service
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node_pool
operator: In
values:
- n2d-custom-8-65536
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: elasticsearch.k8s.elastic.co/cluster-name
operator: In
values:
- elasticsearch-myapp
topologyKey: topology.kubernetes.io/zone
weight: 100
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: elasticsearch.k8s.elastic.co/cluster-name
operator: In
values:
- elasticsearch-myapp
topologyKey: kubernetes.io/hostname
automountServiceAccountToken: true
containers:
- name: elasticsearch
resources:
limits:
cpu: "7"
memory: 56Gi
requests:
cpu: "7"
memory: 56Gi
initContainers:
- command:
- sh
- -c
- sysctl -w vm.max_map_count=262144
image: busybox:1.28
name: sysctl
securityContext:
privileged: true
- command:
- sh
- -c
- bin/elasticsearch-plugin install --batch analysis-icu
name: analysis-icu
- command:
- sh
- -c
- bin/elasticsearch-plugin install --batch repository-gcs
name: repository-gcs
priorityClassName: app-high-preempting
serviceAccount: myapp-elasticsearch
serviceAccountName: myapp-elasticsearch
tolerations:
- effect: NoSchedule
key: n2d-custom-8-65536
operator: Equal
value: "true"
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
storageClassName: standard-rwo
- Logs:
Continuous loop of reconciliation failures and timeout accompanied by the following.
Secret "elasticsearch-myapp-es-data-es-transport-certs.v1" is invalid: data: Too long: must have at most 1048576 character
One thing you can do to work around this limitation is to create multiple node sets with the data role and scale each of those up until you start running into the size limitation of k8s secrets which seems to be around 150-200 nodes. You can then keep adding node sets until reach the desired scale. See this issue for more context on the current model of one secret for transport certificates per node set.
@pebrc are there any plans to address this? it's been several years since the workaround was implemented. we run a very large deployment of many ES clusters (of which this operator has been fantastically helpful), so when adding some of our more larger clusters, i bumped into this error. quite a surprise, you can imagine.
I'm wondering if we could stop reconciling that Secret if we use a CSI driver to manage the certificates for example? (Or give an option to the user skip the reconciliation of that Secret?)
@barkbay I think that's a good idea.
@nullren we don't have concrete plans to address this right now. Did the workaround, using multiple node sets instead of one big one, have drawbacks for you that made you want to stick with a single node set?
@barkbay I think that's a good idea.
@nullren we don't have concrete plans to address this right now. Did the workaround, using multiple node sets instead of one big one, have drawbacks for you that made you want to stick with a single node set?
The work around did "work", but it is a whole lot of unnecessary complexity for something we don't even use (we disable security and dont use the certs at all as we use our own network framework on k8s). There's just a lot of extra tooling we have to update to ensure that node sets "data-0", "data-1", ..., "data-N" are all found and reconciled correctly. Still finding some bugs due to this.
We have implemented an option to turn off the ECK managed self-signed certificates in https://github.com/elastic/cloud-on-k8s/pull/7925 which is going to ship with the next release of ECK. This should cover the case you mentioned @nullren. This means we now have two workarounds for large clusters:
Either:
- split a node set into mulitple node sets or
- disable the transport certs and provision them externally (e.g. with cert-manager)
My vote would be to close this issue unless there are additional concerns we did not address with these changes.
@pebrc that works for me. Thank you!