vCluster creating more trouble than helping(due to different causes)
What happened?
I honestly if it is supposed to be that hard but vcluster is creating more trouble than solutions...i've been working on getting a prod ready cluster for over a week and its not working. Right now the cluster is up and running and connect to it using NodePort Service... the issues:
- When i deploy the cluster, it takes over 60min for the cluster Pods to be running in healthy state , so i can communicate with the vcluster
- the
corednspod though in staterunningis full of errors:
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized
[INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: failed to list *v1.Service: Unauthorized
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized
[INFO] plugin/kubernetes: Trace[944124959]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (24-May-2024 11:55:04.692) (total time: 16726ms): Trace[944124959]: ---"Objects listed" error:<nil> 16726ms (11:55:21.419) Trace[944124959]: [16.726980037s] [16.726980037s] END
[INFO] plugin/kubernetes: Trace[1216864093]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (24-May-2024 11:54:50.867) (total time: 30559ms): = Trace[1216864093]: ---"Objects listed" error:<nil> 30559ms (11:55:21.426) Trace[1216864093]: [30.55934034s] [30.55934034s] END
[INFO] plugin/kubernetes: Trace[1087029931]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (24-May-2024 11:54:49.127) (total time: 32304ms): Trace[1087029931]: ---"Objects listed" error:<nil> 32304ms (11:55:21.432) Trace[1087029931]: [32.304771091s] [32.304771091s] END
-
When connected to the vcluster, requests delivers different responses each time: EG: Running
kubectl get namespacesmight show 4 Namespaces; then 4, and then 6 etc. -
Running helm on the vcluster is nearly impossible..it times out nearly every single time..
This is all bordering because i was expecting vCluster way easier to use.
What did you expect to happen?
i deployed a cluster and expected the coreDNS pods to be deployed
How can we reproduce it (as minimally and precisely as possible)?
# vcluster.yaml
exportKubeConfig:
context: "sharedpool-context"
controlPlane:
coredns:
enabled: true
embedded: false
deployment:
replicas: 2
nodeSelector:
workload: wk1
statefulSet:
highAvailability:
replicas: 2
persistence:
volumeClaim:
enabled: true
scheduling:
nodeSelector:
workload: wk1
resources:
limits:
ephemeral-storage: 20Gi
memory: 10Gi
requests:
ephemeral-storage: 200Mi
cpu: 200m
memory: 256Mi
proxy:
bindAddress: "0.0.0.0"
port: 8443
extraSANs:
- XX.XX.XX.XXX
- YY.YY.YY.YYY
helm upgrade -i my-vcluster vcluster \
--repo https://charts.loft.sh \
--namespace vcluster-ns --create-namespace \
--repository-config='' \
-f vcluster.yaml \
--version 0.20.0-beta.5
Anything else we need to know?
i used a nodePort service to connect to the cluster
# nodeport.yaml
apiVersion: v1
kind: Service
metadata:
name: vcluster-nodeport
namespace: vcluster-ns
spec:
selector:
app: vcluster
release: shared-pool-vcluster
ports:
- name: https
port: 443
targetPort: 8443
protocol: TCP
nodePort: 31222
type: NodePort
$ helm upgrade -i solr-operator apache-solr/solr-operator --version 0.8.1 -n solr-cloud
Release "solr-operator" does not exist. Installing it now.
Error: failed post-install: 1 error occurred:
* timed out waiting for the condition
$ helm upgrade -i hz-operator hazelcast/hazelcast-platform-operator -n hz-vc-ns --create-namespace -f operator.yaml
Release "hz-operator" does not exist. Installing it now.
Error: 9 errors occurred:
* Timeout: request did not complete within requested timeout - context deadline exceeded
* Timeout: request did not complete within requested timeout - context deadline exceeded
* Timeout: request did not complete within requested timeout - context deadline exceeded
* Timeout: request did not complete within requested timeout - context deadline exceeded
* Timeout: request did not complete within requested timeout - context deadline exceeded
* Timeout: request did not complete within requested timeout - context deadline exceeded
* Timeout: request did not complete within requested timeout - context deadline exceeded
* Timeout: request did not complete within requested timeout - context deadline exceeded
* Internal error occurred: resource quota evaluation timed out
Host cluster Kubernetes version
$ kubectl version
# paste output here
Host cluster Kubernetes distribution
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.11
vlcuster version
$ vcluster --version
vcluster version 0.19.5
Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)
default..
# i did no specify a speciffc distribution
OS and Arch
OS: talos
Arch: metal-amd64
hey @MichaelKora , it's unfortunate that have to experience these troubles.
One thing that I'd recommend is to use the latest vcluster CLI, together with 0.20.0-beta.5. From the description it appears that you using the 0.19.5 one instead.
Regarding the other issues: It's a bit hard to say from the outset what might causing your issues. You seem to be leveraging Talos. What Kubernetes distro is running on top of it?
hey @heiko-braun 0.19.5 is the latest according to vcluster cli
$ sudo vcluster upgrade
15:55:48 info Current binary is the latest version: 0.19.5
i have a TalosRunning there( the default image) its based on k3s
Hi @MichaelKora, you can get the latest CLI (the one to be used with 0.20 vcluster.yaml) here: https://github.com/loft-sh/vcluster/releases/tag/v0.20.0-beta.6
@MichaelKora the hazelcast and solr examples in the description, did you run the commands against the host cluster or the virtual one?
@heiko-braun thanks for your response I run the command against the vcluster... when run against the host cluster, I have no issues
@heiko-braun when the cluster is being created, the logs show:
2024-06-05 14:38:37 INFO setup/controller_context.go:196 couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": dial tcp 127.0.0.1:6443: connect: connection refused), will retry in 1 seconds {"component": "vcluster"}
2024-06-05 14:38:38 INFO setup/controller_context.go:196 couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": dial tcp 127.0.0.1:6443: connect: connection refused), wil l retry in 1 seconds {"component": "vcluster"}
2024-06-05 14:38:39 INFO setup/controller_context.go:196 couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": dial tcp 127.0.0.1:6443: connect: connection refused), will retry in 1 seconds {"component": "vcluster"}
2024-06-05 14:38:40 INFO commandwriter/commandwriter.go:126 error retrieving resource lock kube-system/kube-controller-manager: Get "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": dial tcp 127.0.0.1:6443: connect: connection refused {"component": "vcluster", "component": "controller-manager", "location": "leaderelection.go:33 2"}
and it takes more than 60min before bringing the cluster to an healthy state..that seems verry odd to me that it takes that long to create a virt cluster
@MichaelKora how many nodes does your host cluster have, and what capacity? do you use network policies?
hey @everflux i dedicated 2nodes of the host cluster to the vcluster...8cpu/32GB..i am not using any restrictive network policies
This sounds like a setup problem to me, either with the host cluster or vcluster. Did you try to setup one or multiple vclusters? (check kubectl get all -n vcluster-ns, kubectl get ns) I am afraid a github issue might not be the right place to discuss this, perhaps the slack channel would be better suited.
@MichaelKora Are you still having issues or were you able to resolve them?
hey @deniseschannon, yes i am still having the issue!
This sounds like a setup problem to me, either with the host cluster or vcluster. Did you try to setup one or multiple vclusters? (check kubectl get all -n vcluster-ns, kubectl get ns) I am afraid a github issue might not be the right place to discuss this, perhaps the slack channel would be better suited.
@everflux i have just one setup
@deniseschannon @heiko-braun @everflux Any update on the origin of that issue and how to fix it?
I think the slack channel or direct consulting is a better place for support than a github issue in this case.
@MichaelKora - Using slack would be better when troubleshooting and if you could try the latest version of vcluster, that would also be great. When troubleshooting, with v0.20+, it would be great if you could also provide your vcluster.yaml.
I'm going to close this issue and if you are still having issues, can you open a new one? Thanks!