kuma icon indicating copy to clipboard operation
kuma copied to clipboard

Difficult Day 0

Open bkk-bcd opened this issue 2 years ago • 17 comments

What happened?

  1. Follow installation instructions (https://kuma.io/docs/1.2.x/installation/eks/)
	./kumactl install control-plane | kubectl apply -f - 

	kumactl install control-plane | kubectl apply -f -
	Error: Failed to render helm template files: Failed to render templates: template: kuma/templates/cp-webhooks-and-secrets.yaml:20:16: executing "kuma/templates/cp-webhooks-and-secrets.yaml" at <lookup "v1" "Secret" .Release.Namespace $secretName>: error calling lookup: unable to get apiresource from unstructured: /v1, Kind=Secret: exec plugin: invalid apiVersion "client.authentication.k8s.io/v1alpha1"
	error: error parsing STDIN: error converting YAML to JSON: yaml: line 2: mapping values are not allowed in this context
```	
	
2. Install Kuma with helm

helm install --create-namespace --namespace kuma-system kuma kuma/kuma
	
3. Launch UI to verify installation all looks well

kubectl port-forward svc/kuma-control-plane -n kuma-system 5681:5681
	
4. Install demo app

```	
	kubectl apply -f demo.yaml
  1. Note that no demo pods are coming up, look at demo app replicaset and see the following error
	Warning  FailedCreate  8m30s (x20 over 49m)  replicaset-controller  Error creating: Internal error occurred: failed calling webhook "namespace-kuma-injector.kuma.io": failed to call webhook: Post "[https://kuma-control-plane.kuma-system.svc:443/inject-sidecar?timeout=10s](https://kuma-control-plane.kuma-system.svc/inject-sidecar?timeout=10s)": context deadline exceeded
  1. Take a look at the control plane logs; no errors only this interesting message repeated over and over
	2022-06-25T11:38:10.363Z    INFO    defaults    trying to create default Mesh
  1. Look in helm chart docs for debug flag and redeploy helm
	helm upgrade --install --set controlPlane.logLevel=debug --namespace kuma-system kuma kuma/kuma
  1. Take a look at the control plane logs
	2022-06-25T11:40:55.488Z    INFO    defaults    trying to create default Mesh                                                                          │
	│ 2022-06-25T11:41:05.496Z    DEBUG    defaults    could not create default mesh    {"err": "failed to create k8s resource: Internal error occurred: fai │
	│ led calling webhook \"mesh.defaulter.kuma-admission.kuma.io\": failed to call webhook: Post \"https://kuma-control-plane.kuma-system.svc:443/default-k │
	│ uma-io-v1alpha1-mesh?timeout=10s\": context deadline exceeded", "errVerbose": "Internal error occurred: failed calling webhook \"mesh.defaulter.kuma-a │
	│ dmission.kuma.io\": failed to call webhook: Post \"https://kuma-control-plane.kuma-system.svc:443/default-kuma-io-v1alpha1-mesh?timeout=10s\": context │
	│  deadline exceeded\nfailed to create k8s resource\ngithub.com/kumahq/kuma/pkg/plugins/resources/k8s.(*KubernetesStore).Create\n\t/home/circleci/projec │
	│ t/pkg/plugins/resources/k8s/store.go:75\ngithub.com/kumahq/kuma/pkg/core/resources/store.(*paginationStore).Create\n\t/home/circleci/project/pkg/core/ │
	│ resources/store/pagination_store.go:30\ngithub.com/kumahq/kuma/pkg/metrics/store.(*MeteredStore).Create\n\t/home/circleci/project/pkg/metrics/store/st │
	│ ore.go:38\ngithub.com/kumahq/kuma/pkg/core/resources/store.(*customizableResourceStore).Create\n\t/home/circleci/project/pkg/core/resources/store/cust │
	│ omizable_store.go:30\ngithub.com/kumahq/kuma/pkg/core/managers/apis/mesh.(*meshManager).Create\n\t/home/circleci/project/pkg/core/managers/apis/mesh/m │
	│ esh_manager.go:82\ngithub.com/kumahq/kuma/pkg/core/resources/manager.(*customizableResourceManager).Create\n\t/home/circleci/project/pkg/core/resource │
	│ s/manager/customizable_manager.go:48\ngithub.com/kumahq/kuma/pkg/defaults.(*defaultsComponent).createMeshIfNotExist\n\t/home/circleci/project/pkg/defa │
	│ ults/mesh.go:26\ngithub.com/kumahq/kuma/pkg/defaults.(*defaultsComponent).Start.func1.1\n\t/home/circleci/project/pkg/defaults/components.go:90\ngithu │
	│ b.com/sethvargo/go-retry.Do\n\t/home/circleci/.go-kuma-go/pkg/mod/github.com/sethvargo/[email protected]/retry.go:60\ngithub.com/kumahq/kuma/pkg/default │
	│ s.(*defaultsComponent).Start.func1\n\t/home/circleci/project/pkg/defaults/components.go:89\nruntime.goexit\n\t/home/circleci/go/src/runtime/asm_amd64. │
	│ s:1571"}
  1. Scratch head 😕

bkk-bcd avatar Jun 25 '22 11:06 bkk-bcd

Hey, thanks for the report.

For the first point, kumactl install checks secret in kuma-system namespace so we don't override the secret when we upgrade with it. It might have been a case that your KUBECONFIG was not authorized to check it.

For the rest, can you share a bit more about your environment so we can try to reproduce it? Did you create a cluster with eksctl? If so, can you share the settings? Do you have any specific firewall settings in the cluster? What networking plugin do you use in your EKS Kubernetes cluster?

potential xref https://github.com/elastic/cloud-on-k8s/issues/1437

jakubdyszkiewicz avatar Jun 27 '22 14:06 jakubdyszkiewicz

Cheers!

For the first point, kumactl install checks secret in kuma-system namespace so we don't override the secret when we upgrade with it. It might have been a case that your KUBECONFIG was not authorized to check it.

This was on my first run of the command...I'm not aware of a kubeconfig setting that would prevent that, other than having the wrong role. But K8S is deeply complex so very easy to miss something. I'm using an admin role in my case, and can create all the things on that cluster by running helm/kubectl locally to create secrets, deployments, etc on EKS (though in most cases we use argocd.)

Running EKS 1.22 with the AWS VPC CNI v1.10.3-eksbuild.1

We provision our clusters with a thin wrapper module that calls the Terraform EKS module: https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest . There's a lot to unpack there it might be better for me to provide any specific information you were looking for then trying to unwind it all here.

The network setup is fairly vanilla with two private subnets only with a mix of spot and on-demand node groups. However we are running AWS firewall and that could be relevant here as we do whitelist egress traffic.

Let me know what else could be helpful, happy to go into more details.

bkk-bcd avatar Jun 27 '22 23:06 bkk-bcd

Can you check if you have a secret named something like kuma-tls-cert? I think I might be very similar to what's described here: https://github.com/kumahq/kuma/issues/2023

lahabana avatar Jun 30 '22 07:06 lahabana

Thanks, I'll take a look and get back to you.

bkk-bcd avatar Jun 30 '22 13:06 bkk-bcd

Hi @bkk-bcd any updates?

lahabana avatar Jul 06 '22 09:07 lahabana

Not yet, got behind on some other things will circle back to this. Thanks for the ping!

bkk-bcd avatar Jul 06 '22 11:07 bkk-bcd

Hi @bkk-bcd - curious to hear any updates on this front.

subnetmarco avatar Jul 14 '22 23:07 subnetmarco

Hey, thanks for the report.

For the first point, kumactl install checks secret in kuma-system namespace so we don't override the secret when we upgrade with it. It might have been a case that your KUBECONFIG was not authorized to check it.

For the rest, can you share a bit more about your environment so we can try to reproduce it? Did you create a cluster with eksctl? If so, can you share the settings? Do you have any specific firewall settings in the cluster? What networking plugin do you use in your EKS Kubernetes cluster?

potential xref elastic/cloud-on-k8s#1437

Does this apply to helm upgrade too? about secret override validation, because it happens on my cluster too when upgrade from basic values to global zone values

ayamGelo avatar Jul 20 '22 00:07 ayamGelo

Are you transforming a standalone default install into a global install? I might be wrong but I think the secrets might just be different. Have you looked at the secrets creation and update times? @parkanzky might be able to help here

lahabana avatar Jul 21 '22 08:07 lahabana

Are you transforming a standalone default install into a global install? I might be wrong but I think the secrets might just be different. Have you looked at the secrets creation and update times? @parkanzky might be able to help here

I try it with provide self certificate, 3 secret cert like best practice in documentation data-plane to control-plane, user access api, and control-plane to control-plane

Try to install and upgrade with same certificate still same, i tought it's the functional of global-cp in documentation they statement global-cp will not connect to local data-plane, but with this issue we can't inject deployment in global-cp so the data plane cannot created in global-cp

And at least, i must created it with 3 clusters, 1 global-cp to control 2 local-cp, and yeah communication between data-plane in local-cp working perfectly

ayamGelo avatar Jul 21 '22 09:07 ayamGelo

Ho this is expected. As you mentioned dataplanes can't connect to the global control-plane what are you seeing as unexpected behaviour?

lahabana avatar Jul 21 '22 09:07 lahabana

failed calling webhook "namespace-kuma-injector.kuma.io" maybe the expected, data-plane still created in global-cp but not connected, i don't know how this blocking works with changing certificate or delete the webhook layer, i'm just give the insight one

ayamGelo avatar Jul 21 '22 09:07 ayamGelo

Maybe the upgrade didn't delete the webhooks. That's something to look into

lahabana avatar Jul 21 '22 10:07 lahabana

Maybe the upgrade didn't delete the webhooks. That's something to look into

I try it with fresh install too, with values to make them global-cp in multi-zone mode, Because of that error, I can't install observabilty tools

$ kubectl get event
62s         Warning   FailedCreate           replicaset/grafana-5fbfcc78c6                         Error creating: Internal error occurred: failed calling webhook "pods-kuma-injector.kuma.io": the server could not find the requested resource
...


$ kubectl describe replicaset grafana-5fbfcc78c6
...
Conditions:
  Type             Status  Reason
  ----             ------  ------
  ReplicaFailure   True    FailedCreate
Events:
  Type     Reason        Age                 From                   Message
  ----     ------        ----                ----                   -------
  Warning  FailedCreate  14s (x18 over 11m)  replicaset-controller  Error creating: Internal error occurred: failed calling webhook "pods-kuma-injector.kuma.io": the server could not find the requested resource


$ kubectl describe ns mesh-observability
Name:         mesh-observability
Labels:       kuma.io/sidecar-injection=disabled
Annotations:  kuma.io/mesh: default
Status:       Active

No resource quota.

No LimitRange resource.
$ helm get values kuma -n kuma-system
USER-SUPPLIED VALUES:
controlPlane:
  mode: global
  tls:
    apiServer:
      secretName: api-server-tls
    general:
      caBundle: xxx-base64-xxx
      secretName: general-tls-certs
    kdsGlobalServer:
      secretName: kds-server-tls

ayamGelo avatar Jul 21 '22 23:07 ayamGelo

We fixed the charts so that no webhooks are created in global mode

@ayamGelo that should cover the issue with the failing webhook. Does that cover everything?

michaelbeaumont avatar Jul 26 '22 12:07 michaelbeaumont

We fixed the charts so that no webhooks are created in global mode

@ayamGelo that should cover the issue with the failing webhook. Does that cover everything?

Cool, it's fixed, when this new release come out, or i just delete the webhook for existing mitigate?

ayamGelo avatar Jul 31 '22 16:07 ayamGelo

Should be enough

lahabana avatar Aug 01 '22 07:08 lahabana

@bkk-bcd any updates here?

lahabana avatar Aug 23 '22 08:08 lahabana

@bkk-bcd Any update on the issue?

lukidzi avatar Sep 20 '22 10:09 lukidzi

Automatically closing the issue due to having one of the "closed state label".

github-actions[bot] avatar Sep 20 '22 15:09 github-actions[bot]

Closing for now. Feel free to reopen if the issue persists

jakubdyszkiewicz avatar Sep 20 '22 15:09 jakubdyszkiewicz