vcluster icon indicating copy to clipboard operation
vcluster copied to clipboard

vcluster-eks: vcluster-api: "watch chan error: etcdserver: mvcc: required revision has been compacted"

Open joaocc opened this issue 1 year ago • 3 comments

What happened?

Installed vcluster-eks 0.16.4 on EKS 1.27. Storage for etcd is on EFS. Messages start almost immediately after vcluster-api pod starts

I1101 10:33:31.743782       1 aggregator.go:164] waiting for initial CRD sync...
I1101 10:33:31.748914       1 gc_controller.go:78] Starting apiserver lease garbage collector
I1101 10:33:31.748967       1 handler_discovery.go:412] Starting ResourceDiscoveryManager
I1101 10:33:31.742914       1 controller.go:78] Starting OpenAPI AggregationController
I1101 10:33:31.750829       1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/run/config/pki/ca.crt"
I1101 10:33:31.751017       1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/run/config/pki/front-proxy-ca.crt"
E1101 10:33:31.843347       1 controller.go:95] Found stale data, removed previous endpoints on kubernetes service, apiserver didn't exit successfully previously
I1101 10:33:31.846125       1 shared_informer.go:318] Caches are synced for cluster_authentication_trust_controller
I1101 10:33:31.849180       1 apf_controller.go:377] Running API Priority and Fairness config worker
I1101 10:33:31.849368       1 apf_controller.go:380] Running API Priority and Fairness periodic rebalancing process
I1101 10:33:31.927892       1 shared_informer.go:318] Caches are synced for node_authorizer
I1101 10:33:31.932853       1 controller.go:624] quota admission added evaluator for: leases.coordination.k8s.io
I1101 10:33:31.939522       1 cache.go:39] Caches are synced for AvailableConditionController controller
I1101 10:33:31.940661       1 shared_informer.go:318] Caches are synced for crd-autoregister
I1101 10:33:31.940723       1 aggregator.go:166] initial CRD sync complete...
I1101 10:33:31.940737       1 autoregister_controller.go:141] Starting autoregister controller
I1101 10:33:31.940745       1 cache.go:32] Waiting for caches to sync for autoregister controller
I1101 10:33:31.940757       1 cache.go:39] Caches are synced for autoregister controller
I1101 10:33:31.943259       1 cache.go:39] Caches are synced for APIServiceRegistrationController controller
I1101 10:33:31.944056       1 shared_informer.go:318] Caches are synced for configmaps
I1101 10:33:32.748362       1 storage_scheduling.go:111] all system priority classes are created successfully or already exist.
W1101 10:33:35.109679       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:35.712071       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.027911       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.027960       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.027983       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028004       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028029       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028051       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.028070       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029210       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029246       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029823       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029858       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029865       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.029880       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.030282       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.128476       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted
W1101 10:33:38.128503       1 watcher.go:245] watch chan error: etcdserver: mvcc: required revision has been compacted

What did you expect to happen?

No warning messages

How can we reproduce it (as minimally and precisely as possible)?

Not sure how to reproduce in minimal environment.

Anything else we need to know?

Install done via flux2 (HelmRelease) Potentially relevant links:

  • https://github.com/kubernetes/kubernetes/issues/116289 seems to point out that this may be an usage issue, from tying to access data that has already been compacted
  • https://github.com/etcd-io/etcd/issues/10450#issuecomment-461247292 seems to indicate this is not really an error

Host cluster Kubernetes version

$ kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.6-eks-f8587cb

Host cluster Kubernetes distribution

EKS 1.27

vlcuster version

$ vcluster --version
vcluster version 0.15.7

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

eks

OS and Arch

OS: macOS
Arch: arm64

joaocc avatar Nov 01 '23 10:11 joaocc

@joaocc sorry for the delay, vcluster has problems with EFS as its causing issues with databases in general, do you have any chance to use EBS or something similar?

FabianKramm avatar Nov 13 '23 15:11 FabianKramm

Hi. Not really. We are using EFS as a way to simplify HA storage. Contrary to Azure, where ZRS allows mountable volumes that cross different AZs, it seems AWS EBS is restricted to a single AZ, so a vcluster that ends up being booted on another node would not be able to mount the EBS. On the other hand, we haven't noticed any kind of practical issues. Are you saying that EFS is not a supported storage for etcd? Thanks

joaocc avatar Nov 14 '23 09:11 joaocc

@FabianKramm following up on this thread...

  • we use eks-d because the remaining distros use sqlite which is indeed not able to be hosted on NFS-type file systems; do you think the new k3s-with-etcd at v0.19.x will have the same issues?
  • regarding EBS, is there any guidance to have non-HA deployments work well with EBS on multi-AZ clusters (in the case a single AZ becomes unavailable) - this is a scenario that is supported quite well with EFS.
  • is there any official statement on EFS being a supported store with etcd?

For reference, we continue not to have any practical issues, except for a elevated EFS billing account (writes ~440MB/sec), which we are still trying to understand if it is from these writes or from something else.

joaocc avatar Mar 19 '24 16:03 joaocc