operator-sdk
operator-sdk copied to clipboard
Don't use `opm:latest` for `bundle run` and `bundle run-upgrade`
Bug Report
What did you do?
Nothing stuff broke by itself, because the sdk is not using immutable image tags.
What did you expect to see?
bundle run working as it did for the last CI job.
What did you see instead? Under which circumstances?
The registry pod not starting up. The following error was in the logs:
standard_init_linux.go:228: exec user process caused: exec format error
Environment
Operator type:
/language go
Kubernetes cluster type:
GKE
$ operator-sdk version
operator-sdk version: "v1.22.1", commit: "46ab175459a775d2fb9f0454d0b4a8850dd745ed", kubernetes version: "1.24.1", go version: "go1.18.3", GOOS: "linux", GOARCH: "amd64"
$ go version (if language is Go)
go version go1.18.3 linux/amd64
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2", GitCommit:"f66044f4361b9f1f96f0053dd46cb7dce5e990a8", GitTreeState:"clean", BuildDate:"2022-06-17T22:28:26Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.12-gke.1500", GitCommit:"6c11aec6ce32cf0d66a2631eed2eb49dd65c89f8", GitTreeState:"clean", BuildDate:"2022-05-11T09:25:37Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}
Possible Solution
Instead of using quay.io/operator-framework/opm:latest to spawn the registry, please use a release tag, such as quay.io/operator-framework/opm:v1.23.2.
Additional context
This came up because the OPM image was pushed for arm64. :facepalm: But such actions in other projects should not break the SDK.
https://github.com/operator-framework/operator-registry/issues/992
I tried both https://github.com/operator-framework/operator-registry/releases/download/v1.23.2/linux-amd64-opm and https://github.com/operator-framework/operator-registry/releases/download/v1.23.0/linux-amd64-opm versions, got the following error:
Image ID: image-registry.openshift-image-registry.svc:5000/openshift-marketplace/ci-index@sha256:2623e8b1c275f4259c6f92eb4d9bc79ee2736445a854cd668358d543c9e34bed
Port: 50051/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: exec /bin/opm: exec format error
Exit Code: 1
Started: Wed, 13 Jul 2022 03:05:42 -0400
Finished: Wed, 13 Jul 2022 03:05:42 -0400
Ready: False
Restart Count: 4
I can only confirm I'm seeing the same issue. A potential workaround until this is resolved would be to set an extra flag for bundle run -> --index-image quay.io/operator-framework/opm:${OPERATOR_SDK_VERSION} in your scripts.
I can only confirm I'm seeing the same issue. A potential workaround until this is resolved would be to set an extra flag for
bundle run->--index-image quay.io/operator-framework/opm:${OPERATOR_SDK_VERSION}in your scripts.
Same for when running opm index add, adding the same flag worked.
The version should also be pinned in catalog.Dockerfile.
Hi @fedepaol,
I can only confirm I'm seeing the same issue. A potential workaround until this is resolved would be to set an extra flag for bundle run -> --index-image quay.io/operator-framework/opm:${OPERATOR_SDK_VERSION} in your scripts. Same for when running opm index add, adding the same flag worked.
The error faced when you use opm index add shows not related to this issue specifically. Note that the latest version quay.io/operator-framework/opm:latest is using the new format so you will probably not be able to use opm index add. You can know more about the new format and why we have been moving in this direction by looking at: https://github.com/redhat-openshift-ecosystem/community-operators-prod/discussions/512
It should be possible to use opm:latest with the imperative/sqlite opm index|registry add flows. The opm:latest image is the full opm binary at the highest released semver version of the operator-registry repo. So it contains all of the deprecated sqlite functionality still.
My perspective on this is that SDK is intentionally using the latest and greatest OPM releases:
- so that SDK users pick up CVE/bug fixes as soon as new versions of OPM are released
- so that SDK maintainers don't have to remember to bump the OPM version for every single release.
- because there is an expectation that OPM will not break, and if it does the major version will be bumped.
Two things that perhaps should be considered:
- (if it doesn't already exist), a nightly CI test that involves
run bundleandrun bundle-upgradeto ensure catch breakages that happen in OPM. - Use
opm:v1rather thanopm:latestsince there is an expectation that a new major version ofopmcould cause problems.
Hi @fedepaol,
I can only confirm I'm seeing the same issue. A potential workaround until this is resolved would be to set an extra flag for bundle run -> --index-image quay.io/operator-framework/opm:${OPERATOR_SDK_VERSION} in your scripts. Same for when running opm index add, adding the same flag worked.
The error faced when you use
opm index addshows not related to this issue specifically. Note that the latest versionquay.io/operator-framework/opm:latestis using the new format so you will probably not be able to useopm index add. You can know more about the new format and why we have been moving in this direction by looking at: redhat-openshift-ecosystem/community-operators-prod#512
I'll check about the new format, thanks for the heads up @camilamacedo86 !
Re: the error I was getting with opm index add, I did not get any errors when creating the index, but I was getting the samestandard_init_linux.go:228: exec user process caused: exec format error error when deploying, which was fixed by specifying the index image. Anyhow, things seems to be settled now, and I'll move the metallb operator to the new format. Thanks again!
I don't think we should pin to a specific image. This was a fluke error that happened and it was fixed. We can stick to using opm:latest. Closing this issue for now.
@jmrodri it's a fluke that broke user workflows without changes. That caused quite a bit of work and trouble. It can always happen again.
It's also bad practice and makes it difficult to protect against supply-chain attacks... :shrug:
On the flip side, pinning means higher probability of running code with vulnerabilities.
I'd also change the SDK default back to latest. Users can choose to continue pinning with the flag to ensure stable pipelines.