operator-sdk icon indicating copy to clipboard operation
operator-sdk copied to clipboard

Don't use `opm:latest` for `bundle run` and `bundle run-upgrade`

Open jeloba opened this issue 3 years ago • 8 comments

Bug Report

What did you do?

Nothing stuff broke by itself, because the sdk is not using immutable image tags.

What did you expect to see?

bundle run working as it did for the last CI job.

What did you see instead? Under which circumstances?

The registry pod not starting up. The following error was in the logs:

standard_init_linux.go:228: exec user process caused: exec format error

Environment

Operator type:

/language go

Kubernetes cluster type:

GKE

$ operator-sdk version

operator-sdk version: "v1.22.1", commit: "46ab175459a775d2fb9f0454d0b4a8850dd745ed", kubernetes version: "1.24.1", go version: "go1.18.3", GOOS: "linux", GOARCH: "amd64"

$ go version (if language is Go)

go version go1.18.3 linux/amd64

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.2", GitCommit:"f66044f4361b9f1f96f0053dd46cb7dce5e990a8", GitTreeState:"clean", BuildDate:"2022-06-17T22:28:26Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v4.5.4 Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.12-gke.1500", GitCommit:"6c11aec6ce32cf0d66a2631eed2eb49dd65c89f8", GitTreeState:"clean", BuildDate:"2022-05-11T09:25:37Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

Instead of using quay.io/operator-framework/opm:latest to spawn the registry, please use a release tag, such as quay.io/operator-framework/opm:v1.23.2.

Additional context

This came up because the OPM image was pushed for arm64. :facepalm: But such actions in other projects should not break the SDK.

https://github.com/operator-framework/operator-registry/issues/992

jeloba avatar Jul 13 '22 07:07 jeloba

I tried both https://github.com/operator-framework/operator-registry/releases/download/v1.23.2/linux-amd64-opm and https://github.com/operator-framework/operator-registry/releases/download/v1.23.0/linux-amd64-opm versions, got the following error:

    Image ID:      image-registry.openshift-image-registry.svc:5000/openshift-marketplace/ci-index@sha256:2623e8b1c275f4259c6f92eb4d9bc79ee2736445a854cd668358d543c9e34bed
    Port:          50051/TCP
    Host Port:     0/TCP
    State:         Waiting
      Reason:      CrashLoopBackOff
    Last State:    Terminated
      Reason:      Error
      Message:     exec /bin/opm: exec format error

      Exit Code:    1
      Started:      Wed, 13 Jul 2022 03:05:42 -0400
      Finished:     Wed, 13 Jul 2022 03:05:42 -0400
    Ready:          False
    Restart Count:  4

sabinaaledort avatar Jul 13 '22 08:07 sabinaaledort

I can only confirm I'm seeing the same issue. A potential workaround until this is resolved would be to set an extra flag for bundle run -> --index-image quay.io/operator-framework/opm:${OPERATOR_SDK_VERSION} in your scripts.

bartoszmajsak avatar Jul 13 '22 08:07 bartoszmajsak

I can only confirm I'm seeing the same issue. A potential workaround until this is resolved would be to set an extra flag for bundle run -> --index-image quay.io/operator-framework/opm:${OPERATOR_SDK_VERSION} in your scripts.

Same for when running opm index add, adding the same flag worked.

fedepaol avatar Jul 13 '22 09:07 fedepaol

The version should also be pinned in catalog.Dockerfile.

jeloba avatar Jul 13 '22 12:07 jeloba

Hi @fedepaol,

I can only confirm I'm seeing the same issue. A potential workaround until this is resolved would be to set an extra flag for bundle run -> --index-image quay.io/operator-framework/opm:${OPERATOR_SDK_VERSION} in your scripts. Same for when running opm index add, adding the same flag worked.

The error faced when you use opm index add shows not related to this issue specifically. Note that the latest version quay.io/operator-framework/opm:latest is using the new format so you will probably not be able to use opm index add. You can know more about the new format and why we have been moving in this direction by looking at: https://github.com/redhat-openshift-ecosystem/community-operators-prod/discussions/512

camilamacedo86 avatar Jul 18 '22 18:07 camilamacedo86

It should be possible to use opm:latest with the imperative/sqlite opm index|registry add flows. The opm:latest image is the full opm binary at the highest released semver version of the operator-registry repo. So it contains all of the deprecated sqlite functionality still.

joelanford avatar Jul 18 '22 20:07 joelanford

My perspective on this is that SDK is intentionally using the latest and greatest OPM releases:

  • so that SDK users pick up CVE/bug fixes as soon as new versions of OPM are released
  • so that SDK maintainers don't have to remember to bump the OPM version for every single release.
  • because there is an expectation that OPM will not break, and if it does the major version will be bumped.

Two things that perhaps should be considered:

  • (if it doesn't already exist), a nightly CI test that involves run bundle and run bundle-upgrade to ensure catch breakages that happen in OPM.
  • Use opm:v1 rather than opm:latest since there is an expectation that a new major version of opm could cause problems.

joelanford avatar Jul 18 '22 20:07 joelanford

Hi @fedepaol,

I can only confirm I'm seeing the same issue. A potential workaround until this is resolved would be to set an extra flag for bundle run -> --index-image quay.io/operator-framework/opm:${OPERATOR_SDK_VERSION} in your scripts. Same for when running opm index add, adding the same flag worked.

The error faced when you use opm index add shows not related to this issue specifically. Note that the latest version quay.io/operator-framework/opm:latest is using the new format so you will probably not be able to use opm index add. You can know more about the new format and why we have been moving in this direction by looking at: redhat-openshift-ecosystem/community-operators-prod#512

I'll check about the new format, thanks for the heads up @camilamacedo86 ! Re: the error I was getting with opm index add, I did not get any errors when creating the index, but I was getting the samestandard_init_linux.go:228: exec user process caused: exec format error error when deploying, which was fixed by specifying the index image. Anyhow, things seems to be settled now, and I'll move the metallb operator to the new format. Thanks again!

fedepaol avatar Jul 19 '22 07:07 fedepaol

I don't think we should pin to a specific image. This was a fluke error that happened and it was fixed. We can stick to using opm:latest. Closing this issue for now.

jmrodri avatar Sep 12 '22 18:09 jmrodri

@jmrodri it's a fluke that broke user workflows without changes. That caused quite a bit of work and trouble. It can always happen again.

It's also bad practice and makes it difficult to protect against supply-chain attacks... :shrug:

jeloba avatar Sep 13 '22 07:09 jeloba

On the flip side, pinning means higher probability of running code with vulnerabilities.

I'd also change the SDK default back to latest. Users can choose to continue pinning with the flag to ensure stable pipelines.

joelanford avatar Sep 13 '22 10:09 joelanford