etcd-cluster-operator icon indicating copy to clipboard operation
etcd-cluster-operator copied to clipboard

Not able to install the operator and provision an instance by following the documentation

Open HariNarayananMohan opened this issue 5 years ago • 3 comments

Versions of relevant software used Openshift 4.3 which uses Kubenetes 1.16

What happened I'm trying to get the etcd-cluster-operator run on my cluster and provision EtcdClusters. Followed these Installing instructions and Contributing documentation separately, I was not able to successfully create the EtcdCluster in both ways.

What you expected to happen Install etcd-cluster-operator in my cluster and provision etcdclusters.

How to reproduce it (as minimally and precisely as possible):

Install cert manager

Option 1: Follow Installing Instructions

Step 1: Clone this github repo to your local. Step 2: From the root directory of the repo - cd config/default Step 3: export ECO_VERSION=v0.2.0 Step 4: kustomize edit set image controller=$ECO_VERSION Step 5: kustomize edit set image proxy=$ECO_VERSION Step 6: kubectl apply --kustomize .

Output

error: rawResources failed to read Resources: Load from path ../crd failed: '../crd' must be a file (got d='/<path to repo>/etcd-cluster-operator/config/crd')

Option 2: Follow Contributing Instructions

Step 1: export DOCKER_REPO=<registryname> Step 2: make docker-build Step 3: make docker-push Step 4: make deploy

Current status:

oc get crd | grep improbable 
etcdbackups.etcd.improbable.io                              2020-06-18T17:34:59Z
etcdbackupschedules.etcd.improbable.io                      2020-06-18T17:35:00Z
etcdclusters.etcd.improbable.io                             2020-06-18T17:35:00Z
etcdpeers.etcd.improbable.io                                2020-06-18T17:35:00Z
etcdrestores.etcd.improbable.io                             2020-06-18T17:35:01Z

oc get pods 
NAME                                      READY   STATUS             RESTARTS   AGE
eco-controller-manager-796f74db94-jpfp7   1/1     Running            0          119m
eco-proxy-cfdb688bb-pb5rh                 0/1     CrashLoopBackOff   48         119m

oc logs eco-proxy-cfdb688bb-pb5rh -n eco-system 
2020-06-18T20:18:21.298Z	INFO	setup	Starting proxy	{"version": "v0.2.0-23-gf84abc6"}
2020-06-18T20:18:21.299Z	INFO	setup	Listening	{"grpc-address": ":8080"}

Step 5: kubectl apply -f config/samples/etcd_v1alpha1_etcdcluster.yaml

I tried creating this CR in both hari namespace and eco-system namespace

 oc get etcdclusters.etcd.improbable.io 
NAME         AGE
my-cluster   149m

 oc logs etcdclusters.etcd.improbable.io/my-cluster
error: no kind "EtcdCluster" is registered for version "etcd.improbable.io/v1alpha1" in scheme "k8s.io/kubernetes/pkg/kubectl/scheme/scheme.go:28"

Output Full logs to relevant components eco-controller-manager.log

Anything else we need to know

HariNarayananMohan avatar Jun 18 '20 20:06 HariNarayananMohan

Thanks for the report.

Will investigate if the installation instructions are still up to date. Note to self: CI uses standalone kustomize v3, kubectl ships with kustomize v2. make deploy uses kustomize build <config> | kubectl apply -f.

With the cluster deployment error, the manager is logging

2020-06-18T17:54:30.724Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "etcdcluster", "request": "hari/my-cluster", "error": "Failed to reconcile: unable to create service: services \"my-cluster\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}

A quick search suggests that OpenShift requires an additional RBAC permission for <resource>/finalizer to set finalizers. Our tests and deployments are currently on Kind/GKE, so would have missed this. Will investigate what is needed for a fix.

https://github.com/jaegertracing/jaeger-operator/issues/461 https://github.com/spotahome/redis-operator/issues/98

cheahjs avatar Jun 28 '20 20:06 cheahjs

Thanks for the report.

Will investigate if the installation instructions are still up to date. Note to self: CI uses standalone kustomize v3, kubectl ships with kustomize v2. make deploy uses kustomize build <config> | kubectl apply -f.

With the cluster deployment error, the manager is logging

2020-06-18T17:54:30.724Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "etcdcluster", "request": "hari/my-cluster", "error": "Failed to reconcile: unable to create service: services \"my-cluster\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"}

A quick search suggests that OpenShift requires an additional RBAC permission for <resource>/finalizer to set finalizers. Our tests and deployments are currently on Kind/GKE, so would have missed this. Will investigate what is needed for a fix.

jaegertracing/jaeger-operator#461 spotahome/redis-operator#98

Thank you! I followed it and was able to make it work few days before. Thought of sharing it back.

HariNarayananMohan avatar Jun 29 '20 03:06 HariNarayananMohan

Thank you! I followed it and was able to make it work few days before. Thought of sharing it back.

Hi! Could you share how? :) It's unclear which resource needs this exactly.

In the meantime I've read the sources to see which ownerReferences were created.. and found this works:

- apiGroups:
  - etcd.improbable.io
  resources:
  - etcdbackupschedules/finalizers
  verbs:
  - update
- apiGroups:
  - etcd.improbable.io
  resources:
  - etcdclusters/finalizers
  verbs:
  - update
- apiGroups:
  - etcd.improbable.io
  resources:
  - etcdpeers/finalizers
  verbs:
  - update
- apiGroups:
  - etcd.improbable.io
  resources:
  - etcdrestores/finalizers
  verbs:
  - update

jimmy-scott avatar Sep 08 '21 08:09 jimmy-scott