vcluster
vcluster copied to clipboard
vcluster-eks: vcluster-api: "watch chan error: etcdserver: mvcc: required revision has been compacted"
What happened?
Installed vcluster-eks 0.16.4 on EKS 1.27. Storage for etcd is on EFS. Messages start almost immediately after vcluster-api pod starts
I1101 10:33:31.743782 1 aggregator.go:164] waiting for initial CRD sync...
I1101 10:33:31.748914 1 gc_controller.go:78] Starting apiserver lease garbage collector
I1101 10:33:31.748967 1 handler_discovery.go:412] Starting ResourceDiscoveryManager
I1101 10:33:31.742914 1 controller.go:78] Starting OpenAPI AggregationController
I1101 10:33:31.750829 1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/run/config/pki/ca.crt"
I1101 10:33:31.751017 1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/run/config/pki/front-proxy-ca.crt"
E1101 10:33:31.843347 1 controller.go:95] Found stale data, removed previous endpoints on kubernetes service, apiserver didn't exit successfully previously
I1101 10:33:31.846125 1 shared_informer.go:318] Caches are synced for cluster_authentication_trust_controller
I1101 10:33:31.849180 1 apf_controller.go:377] Running API Priority and Fairness config worker
I1101 10:33:31.849368 1 apf_controller.go:380] Running API Priority and Fairness periodic rebalancing process
I1101 10:33:31.927892 1 shared_informer.go:318] Caches are synced for node_authorizer
I1101 10:33:31.932853 1 controller.go:624] quota admission added evaluator for: leases.coordination.k8s.io
I1101 10:33:31.939522 1 cache.go:39] Caches are synced for AvailableConditionController controller
I1101 10:33:31.940661 1 shared_informer.go:318] Caches are synced for crd-autoregister
I1101 10:33:31.940723 1 aggregator.go:166] initial CRD sync complete...
I1101 10:33:31.940737 1 autoregister_controller.go:141] Starting autoregister controller
I1101 10:33:31.940745 1 cache.go:32] Waiting for caches to sync for autoregister controller
I1101 10:33:31.940757 1 cache.go:39] Caches are synced for autoregister controller
I1101 10:33:31.943259 1 cache.go:39] Caches are synced for APIServiceRegistrationController controller
I1101 10:33:31.944056 1 shared_informer.go:318] Caches are synced for configmaps
I1101 10:33:32.748362 1 storage_scheduling.go:111] all system priority classes are created successfully or already exist.
W1101 10:33:35.109679 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:35.712071 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.027911 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.027960 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.027983 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028004 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028029 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028051 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028070 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029210 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029246 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029823 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029858 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029865 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029880 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.030282 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.128476 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.128503 1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
What did you expect to happen?
No warning messages
How can we reproduce it (as minimally and precisely as possible)?
Not sure how to reproduce in minimal environment.
Anything else we need to know?
Install done via flux2 (HelmRelease) Potentially relevant links:
- https://github.com/kubernetes/kubernetes/issues/116289 seems to point out that this may be an usage issue, from tying to access data that has already been compacted
- https://github.com/etcd-io/etcd/issues/10450#issuecomment-461247292 seems to indicate this is not really an error
Host cluster Kubernetes version
$ kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.6-eks-f8587cb
Host cluster Kubernetes distribution
EKS 1.27
vlcuster version
$ vcluster --version
vcluster version 0.15.7
Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)
eks
OS and Arch
OS: macOS
Arch: arm64
@joaocc sorry for the delay, vcluster has problems with EFS as its causing issues with databases in general, do you have any chance to use EBS or something similar?
Hi. Not really. We are using EFS as a way to simplify HA storage. Contrary to Azure, where ZRS allows mountable volumes that cross different AZs, it seems AWS EBS is restricted to a single AZ, so a vcluster that ends up being booted on another node would not be able to mount the EBS. On the other hand, we haven't noticed any kind of practical issues. Are you saying that EFS is not a supported storage for etcd? Thanks
@FabianKramm following up on this thread...
- we use eks-d because the remaining distros use sqlite which is indeed not able to be hosted on NFS-type file systems; do you think the new k3s-with-etcd at v0.19.x will have the same issues?
- regarding EBS, is there any guidance to have non-HA deployments work well with EBS on multi-AZ clusters (in the case a single AZ becomes unavailable) - this is a scenario that is supported quite well with EFS.
- is there any official statement on EFS being a supported store with etcd?
For reference, we continue not to have any practical issues, except for a elevated EFS billing account (writes ~440MB/sec), which we are still trying to understand if it is from these writes or from something else.