stork icon indicating copy to clipboard operation
stork copied to clipboard

Stork exits with a panic if there are no preinstalled CRDs to a cluster

Open saheienko opened this issue 4 years ago • 0 comments

Is this a BUG REPORT or FEATURE REQUEST?: Bug

What happened: If there is no CRD preinstalled the app exits with a panic when try to register a controller for it:

time="2019-11-29T18:12:53Z" level=info msg="Starting stork version 2.4.0-f14b194b"
time="2019-11-29T18:12:53Z" level=info msg="Using 10.105.47.102:9001 as endpoint for portworx REST endpoint"
time="2019-11-29T18:12:53Z" level=info msg="Using 10.105.47.102:9020 as endpoint for portworx gRPC endpoint"
time="2019-11-29T18:12:53Z" level=info msg="Using http://10.105.47.102:9001 as the endpoint"
I1129 18:12:53.468657       1 leaderelection.go:185] attempting to acquire leader lease  kube-system/stork...
I1129 18:13:10.270406       1 leaderelection.go:194] successfully acquired lease kube-system/stork

time="2019-11-29T18:13:15Z" level=debug msg="Monitoring storage nodes"
time="2019-11-29T18:13:20Z" level=info msg="Registering CRDs"
I1129 18:13:20.497456       1 snapshot-controller.go:167] Starting snapshot controller
I1129 18:13:20.497529       1 controller_utils.go:1025] Waiting for caches to sync for snapshotdata-cache controller
I1129 18:13:20.597762       1 controller_utils.go:1032] Caches are synced for snapshotdata-cache controller
I1129 18:13:20.597821       1 controller_utils.go:1025] Waiting for caches to sync for snapshot-controller controller
I1129 18:13:20.697972       1 controller_utils.go:1032] Caches are synced for snapshot-controller controller
I1129 18:13:20.706941       1 controller.go:631] Starting provisioner controller stork-snapshot_stork-6c5496c9f8-trcmd_ecb74154-12d3-11ea-ba39-f2d29a0d0b77!
I1129 18:13:20.807217       1 controller.go:680] Started provisioner controller stork-snapshot_stork-6c5496c9f8-trcmd_ecb74154-12d3-11ea-ba39-f2d29a0d0b77!
time="2019-11-29T18:13:25Z" level=debug msg="Registering controller for stork.libopenstorage.org/v1alpha1, Kind=VolumeSnapshotSchedule"
time="2019-11-29T18:13:25Z" level=debug msg="Registered controller for stork.libopenstorage.org/v1alpha1, Kind=VolumeSnapshotSchedule"
time="2019-11-29T18:13:30Z" level=debug msg="Registering controller for stork.libopenstorage.org/v1alpha1, Kind=VolumeSnapshotRestore"
time="2019-11-29T18:13:30Z" level=error msg="failed to get resource client for (apiVersion:stork.libopenstorage.org/v1alpha1, kind:VolumeSnapshotRestore, ns:): failed to get resource type: failed to get the resource REST mapping for GroupVersionKind(stork.libopenstorage.org/v1alpha1, Kind=VolumeSnapshotRestore): no matches for kind \"VolumeSnapshotRestore\" in version \"stork.libopenstorage.org/v1alpha1\""
panic: failed to get resource type: failed to get the resource REST mapping for GroupVersionKind(stork.libopenstorage.org/v1alpha1, Kind=VolumeSnapshotRestore): no matches for kind "VolumeSnapshotRestore" in version "stork.libopenstorage.org/v1alpha1"

goroutine 46 [running]:
github.com/libopenstorage/stork/vendor/github.com/operator-framework/operator-sdk/pkg/sdk.Watch(0xc0008081e0, 0x21, 0x1808527, 0x15, 0x0, 0x0, 0xdf8475800, 0xc0009958d0, 0x1, 0x1)
	/home/unknown/go/src/github.com/libopenstorage/stork/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/api.go:49 +0x46c
github.com/libopenstorage/stork/pkg/controller.Register(0xc000888840, 0x0, 0x0, 0xdf8475800, 0x1f6cf00, 0xc0008a10a0, 0x0, 0x0)
	/home/unknown/go/src/github.com/libopenstorage/stork/pkg/controller/controller.go:75 +0x554
github.com/libopenstorage/stork/pkg/snapshot/controllers.(*SnapshotRestoreController).Init(0xc0008a10a0, 0xc0008a10a0, 0x0)
	/home/unknown/go/src/github.com/libopenstorage/stork/pkg/snapshot/controllers/snapshotrestore.go:47 +0x184
github.com/libopenstorage/stork/pkg/snapshot.(*Snapshot).Start(0xc0005eaae0, 0x0, 0x0)
	/home/unknown/go/src/github.com/libopenstorage/stork/pkg/snapshot/snapshot.go:122 +0x7f9
main.runStork(0x1ffb5e0, 0xc0004aa150, 0x1fbe2c0, 0xc0003e93c0, 0xc000362b00)
	/home/unknown/go/src/github.com/libopenstorage/stork/cmd/stork/stork.go:295 +0xfae
main.run.func1(0xc00009a480)
	/home/unknown/go/src/github.com/libopenstorage/stork/cmd/stork/stork.go:200 +0x4e
created by github.com/libopenstorage/stork/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
	/home/unknown/go/src/github.com/libopenstorage/stork/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:155 +0x9c

After a bunch of restarts it becomes running (deploying/registering one CRD per restart):

NAMESPACE     NAME                                             READY   STATUS    RESTARTS   AGE
kube-system   stork-6c5496c9f8-trcmd                           1/1     Running   11         19h

What you expected to happen: The app starts gracefully.

How to reproduce it (as minimally and precisely as possible): Remove all stork specific CRDs from the cluster and redeploy the app.

Anything else we need to know?: The issues could be here: https://github.com/libopenstorage/stork/blob/fb8f8cb4452d8235a511516e312bb1c692cb9835/vendor/github.com/operator-framework/operator-sdk/pkg/k8sclient/client.go#L71 Operator sdk builds a cached discovery client after registering VolumeSnapshotSchedule and then reuses it for the next controller (VolumeSnapshotRestore).

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

saheienko avatar Nov 30 '19 14:11 saheienko