operator-sdk icon indicating copy to clipboard operation
operator-sdk copied to clipboard

Operator-sdk scorecard basic spec check occasionally produces timeout with no logs

Open tkrishtop opened this issue 3 years ago • 35 comments

Bug Report

operator-sdk v.1.14.0 
scorecard-test v1.14.0

I'm from the DCI team. As a part of our daily cert suite, we regularly run operator-sdk scorecard --selector=test=basic-check-spec-test for two operators: simple-demo-operator and testpmd-operator, 10 tests per day.

Normally, basic-check-spec-test should be green for both. But it fails occasionally in 20% of cases for both operators in a row with timeout error error running tests context deadline exceeded. To increase timeout with wait-time up to 300s doesn't help. Also, the test is always failing for both operators in a row and it looks like a 10-20 min of some internal API.

To have more information, could you please add more information in the logs about where did exactly the timeout happen?

What did you do?

operator_sdk scorecard \
--output json \
--selector=test=basic-check-spec-test \
--kubeconfig {{ kubeconfig_path }} \
--namespace scorecard-testing \
--service-account default \
--config {{ scorecard_config_path }} \
--verbose \
--wait-time 300s \
{{ scorecard_operator_dir }}

What did you expect to see?

The results of basic-check-spec-test should be stable. In the case of timeout, it would be nice to have logs to identify what is the reason for this timeout.

What did you see instead? Under which circumstances?

Timeout in 20% of cases error running tests context deadline exceeded with no detailed logs.

Environment

Operator type:

-
  name: "testpmd-operator"
  version: "v0.2.9"
  image: "quay.io/rh-nfv-int/testpmd-operator-bundle@sha256:5e28f883faacefa847104ebba1a1a22ee897b7576f0af6b8253c68b5c8f42815"
  index_image: "quay.io/tkrishtop/index-testpmd-operator-bundle:v0.2.9"
-
  name: "simple-demo-operator"
  version: "v0.0.3"
  image: "quay.io/opdev/simple-demo-operator-bundle@sha256:eff7f86a54ef2a340dbf739ef955ab50397bef70f26147ed999e989cfc116b79"
  index_image: "quay.io/opdev/simple-demo-operator-catalog:v0.0.3"

Kubernetes cluster type:

Happens randomly for the latest stable OCP 4.7, OCP 4.8, OCP 4.9, OCP 4.10

$ operator-sdk version

operator-sdk v.1.14.0 
scorecard-test v1.14.0

Possible Solution

It would be nice to have more detailed logs to identify what is the reason for this timeout.

tkrishtop avatar Dec 13 '21 09:12 tkrishtop

FTR, I encountered a failure recently with verbose logging turned on. AFAICT the interesting part (i.e. what it is that's actually timing out) is not getting logged, but posting here in case it's useful to someone investigating this.

/go/src/github.com/stackrox/rox/operator//bin/operator-sdk-1.14.0 scorecard --storage-image="quay.io/rhacs-eng/mirror-busybox:@sha256:fe9664ceda9c13b7fd65f5bc3c0bca5060de3a66fedc39d33926b573599da3b7" --wait-time="40s" --verbose quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404
DEBU[0000] Debug logging is set                         
DEBU[0000] resolving                                     host=quay.io
DEBU[0000] do request                                    host=quay.io request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=opm/alpha request.method=HEAD url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/manifests/v3.69.0-nightly-20220404"
DEBU[0000] fetch response received                       host=quay.io response.header.connection=keep-alive response.header.content-length=112 response.header.content-type=application/json response.header.date="Mon, 04 Apr 2022 05:58:50 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.server=nginx/1.14.1 response.header.www-authenticate="Bearer realm=\"https://quay.io/v2/auth\",service=\"quay.io\",scope=\"repository:rhacs-eng/stackrox-operator-bundle:pull\"" response.status="401 UNAUTHORIZED" url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/manifests/v3.69.0-nightly-20220404"
DEBU[0000] Unauthorized                                  header="Bearer realm=\"https://quay.io/v2/auth\",service=\"quay.io\",scope=\"repository:rhacs-eng/stackrox-operator-bundle:pull\"" host=quay.io
DEBU[0000] do request                                    host=quay.io request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=opm/alpha request.method=HEAD url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/manifests/v3.69.0-nightly-20220404"
DEBU[0000] fetch response received                       host=quay.io response.header.connection=keep-alive response.header.content-length=940 response.header.content-type=application/vnd.docker.distribution.manifest.v2+json response.header.date="Mon, 04 Apr 2022 05:58:50 GMT" response.header.docker-content-digest="sha256:1f8462f55f91968c31cf86d54b6e9c2f8ef2e8144f108d095aaeb0a0929e9d12" response.header.server=nginx/1.14.1 response.header.strict-transport-security="max-age=63072000; preload" response.header.x-frame-options=DENY response.status="200 OK" url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/manifests/v3.69.0-nightly-20220404"
DEBU[0000] resolved                                      desc.digest="sha256:1f8462f55f91968c31cf86d54b6e9c2f8ef2e8144f108d095aaeb0a0929e9d12" host=quay.io
DEBU[0000] resolved name: quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404  bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" dir=bundle-421604376
DEBU[0000] fetched                                       bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" digest="sha256:1f8462f55f91968c31cf86d54b6e9c2f8ef2e8144f108d095aaeb0a0929e9d12" dir=bundle-421604376
DEBU[0000] {application/vnd.docker.distribution.manifest.v2+json sha256:1f8462f55f91968c31cf86d54b6e9c2f8ef2e8144f108d095aaeb0a0929e9d12 940 [] map[] <nil>}  bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" dir=bundle-421604376
DEBU[0000] fetch                                         digest="sha256:1f8462f55f91968c31cf86d54b6e9c2f8ef2e8144f108d095aaeb0a0929e9d12" mediatype=application/vnd.docker.distribution.manifest.v2+json size=940
DEBU[0000] do request                                    digest="sha256:1f8462f55f91968c31cf86d54b6e9c2f8ef2e8144f108d095aaeb0a0929e9d12" mediatype=application/vnd.docker.distribution.manifest.v2+json request.header.accept="application/vnd.docker.distribution.manifest.v2+json, */*" request.header.user-agent=opm/alpha request.method=GET size=940 url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/manifests/sha256:1f8462f55f91968c31cf86d54b6e9c2f8ef2e8144f108d095aaeb0a0929e9d12"
DEBU[0000] fetch response received                       digest="sha256:1f8462f55f91968c31cf86d54b6e9c2f8ef2e8144f108d095aaeb0a0929e9d12" mediatype=application/vnd.docker.distribution.manifest.v2+json response.header.connection=keep-alive response.header.content-length=940 response.header.content-type=application/vnd.docker.distribution.manifest.v2+json response.header.date="Mon, 04 Apr 2022 05:58:50 GMT" response.header.docker-content-digest="sha256:1f8462f55f91968c31cf86d54b6e9c2f8ef2e8144f108d095aaeb0a0929e9d12" response.header.server=nginx/1.14.1 response.header.strict-transport-security="max-age=63072000; preload" response.header.x-frame-options=DENY response.status="200 OK" size=940 url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/manifests/sha256:1f8462f55f91968c31cf86d54b6e9c2f8ef2e8144f108d095aaeb0a0929e9d12"
DEBU[0000] fetched                                       bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" digest="sha256:82023726a6e890b5b0db920e4eea2841c8557f931994c36a39b1b5a4b76a9dd8" dir=bundle-421604376
DEBU[0000] {application/vnd.docker.image.rootfs.diff.tar.gzip sha256:82023726a6e890b5b0db920e4eea2841c8557f931994c36a39b1b5a4b76a9dd8 480 [] map[] <nil>}  bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" dir=bundle-421604376
DEBU[0000] fetch                                         digest="sha256:82023726a6e890b5b0db920e4eea2841c8557f931994c36a39b1b5a4b76a9dd8" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip size=480
DEBU[0000] fetched                                       bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" digest="sha256:7e654a7e8519f043e19d3f91b5709b424dc06c0c7787ee8cfd4a0eb14ad63258" dir=bundle-421604376
DEBU[0000] fetched                                       bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" digest="sha256:85dc11d7744b426c884c46a03f87365e54ed43477a62dbcc6cb2737616e10e7b" dir=bundle-421604376
DEBU[0000] {application/vnd.docker.image.rootfs.diff.tar.gzip sha256:7e654a7e8519f043e19d3f91b5709b424dc06c0c7787ee8cfd4a0eb14ad63258 29090 [] map[] <nil>}  bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" dir=bundle-421604376
DEBU[0000] {application/vnd.docker.container.image.v1+json sha256:85dc11d7744b426c884c46a03f87365e54ed43477a62dbcc6cb2737616e10e7b 5623 [] map[] <nil>}  bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" dir=bundle-421604376
DEBU[0000] fetch                                         digest="sha256:85dc11d7744b426c884c46a03f87365e54ed43477a62dbcc6cb2737616e10e7b" mediatype=application/vnd.docker.container.image.v1+json size=5623
DEBU[0000] fetch                                         digest="sha256:7e654a7e8519f043e19d3f91b5709b424dc06c0c7787ee8cfd4a0eb14ad63258" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip size=29090
DEBU[0000] fetched                                       bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" digest="sha256:80583ed86a84ec80184a5bb7917a04b6d568b5ff1464f202986f7629b84c0097" dir=bundle-421604376
DEBU[0000] {application/vnd.docker.image.rootfs.diff.tar.gzip sha256:80583ed86a84ec80184a5bb7917a04b6d568b5ff1464f202986f7629b84c0097 406 [] map[] <nil>}  bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" dir=bundle-421604376
DEBU[0000] fetch                                         digest="sha256:80583ed86a84ec80184a5bb7917a04b6d568b5ff1464f202986f7629b84c0097" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip size=406
DEBU[0000] do request                                    digest="sha256:82023726a6e890b5b0db920e4eea2841c8557f931994c36a39b1b5a4b76a9dd8" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip request.header.accept="application/vnd.docker.image.rootfs.diff.tar.gzip, */*" request.header.user-agent=opm/alpha request.method=GET size=480 url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/blobs/sha256:82023726a6e890b5b0db920e4eea2841c8557f931994c36a39b1b5a4b76a9dd8"
DEBU[0000] do request                                    digest="sha256:85dc11d7744b426c884c46a03f87365e54ed43477a62dbcc6cb2737616e10e7b" mediatype=application/vnd.docker.container.image.v1+json request.header.accept="application/vnd.docker.container.image.v1+json, */*" request.header.user-agent=opm/alpha request.method=GET size=5623 url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/blobs/sha256:85dc11d7744b426c884c46a03f87365e54ed43477a62dbcc6cb2737616e10e7b"
DEBU[0000] do request                                    digest="sha256:7e654a7e8519f043e19d3f91b5709b424dc06c0c7787ee8cfd4a0eb14ad63258" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip request.header.accept="application/vnd.docker.image.rootfs.diff.tar.gzip, */*" request.header.user-agent=opm/alpha request.method=GET size=29090 url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/blobs/sha256:7e654a7e8519f043e19d3f91b5709b424dc06c0c7787ee8cfd4a0eb14ad63258"
DEBU[0000] do request                                    digest="sha256:80583ed86a84ec80184a5bb7917a04b6d568b5ff1464f202986f7629b84c0097" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip request.header.accept="application/vnd.docker.image.rootfs.diff.tar.gzip, */*" request.header.user-agent=opm/alpha request.method=GET size=406 url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/blobs/sha256:80583ed86a84ec80184a5bb7917a04b6d568b5ff1464f202986f7629b84c0097"
DEBU[0000] fetch response received                       digest="sha256:82023726a6e890b5b0db920e4eea2841c8557f931994c36a39b1b5a4b76a9dd8" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.content-length=480 response.header.content-type=binary/octet-stream response.header.date="Mon, 04 Apr 2022 05:58:51 GMT" response.header.etag="\"51f66689c707be0494d193fc7650a5f9-1\"" response.header.last-modified="Mon, 04 Apr 2022 05:28:28 GMT" response.header.server=AmazonS3 response.header.x-amz-id-2="3dW6VmlVBdHl6fMvifIGpq0dWEcd8kDG9cpMSH9+QJzBpTB2Bd9QMnX5lAqBktyjbZluGbwpgNo=" response.header.x-amz-replication-status=COMPLETED response.header.x-amz-request-id=JKTGPACKW93BBZ6C response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-version-id=hawtuRC76z9_U20jVHGUYfpHW8MdEeIH response.status="200 OK" size=480 url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/blobs/sha256:82023726a6e890b5b0db920e4eea2841c8557f931994c36a39b1b5a4b76a9dd8"
DEBU[0000] fetch response received                       digest="sha256:85dc11d7744b426c884c46a03f87365e54ed43477a62dbcc6cb2737616e10e7b" mediatype=application/vnd.docker.container.image.v1+json response.header.accept-ranges=bytes response.header.content-length=5623 response.header.content-type=binary/octet-stream response.header.date="Mon, 04 Apr 2022 05:58:51 GMT" response.header.etag="\"1b7f8046facca76d808acb9350446a89-1\"" response.header.last-modified="Mon, 04 Apr 2022 05:28:29 GMT" response.header.server=AmazonS3 response.header.x-amz-id-2="q4gHx9LgX6QvEl6RM0H2DKjDo6z+cIYIySZnWBvbOMaDDbARUuM0d74aYbfgE7yri71i1WX71RI=" response.header.x-amz-replication-status=COMPLETED response.header.x-amz-request-id=JKTNA878PQ0CN6KH response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-version-id=xY5B4I9EAaNpprD1lCoha71KhEvvS.u_ response.status="200 OK" size=5623 url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/blobs/sha256:85dc11d7744b426c884c46a03f87365e54ed43477a62dbcc6cb2737616e10e7b"
DEBU[0000] fetch response received                       digest="sha256:80583ed86a84ec80184a5bb7917a04b6d568b5ff1464f202986f7629b84c0097" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.content-length=406 response.header.content-type=binary/octet-stream response.header.date="Mon, 04 Apr 2022 05:58:51 GMT" response.header.etag="\"b041299f20acd151b6be2316d966ae21-1\"" response.header.last-modified="Mon, 04 Apr 2022 05:28:28 GMT" response.header.server=AmazonS3 response.header.x-amz-id-2="R9MCOj3SrFzgDT5d7Snp0WlhgkPky19ior+lpfV7KVbpjq0/Vcfwe7LnJBS3qaSmfbLXW9JtnCk=" response.header.x-amz-replication-status=COMPLETED response.header.x-amz-request-id=JKTTWG8J16AQG14R response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-version-id=i7UK3EsZch9IgD6o0OGoB5bMxBwlu7hr response.status="200 OK" size=406 url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/blobs/sha256:80583ed86a84ec80184a5bb7917a04b6d568b5ff1464f202986f7629b84c0097"
DEBU[0000] fetch response received                       digest="sha256:7e654a7e8519f043e19d3f91b5709b424dc06c0c7787ee8cfd4a0eb14ad63258" mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.content-length=29090 response.header.content-type=binary/octet-stream response.header.date="Mon, 04 Apr 2022 05:58:51 GMT" response.header.etag="\"a78a659753af9dcca852fab61394e9ba-1\"" response.header.last-modified="Mon, 04 Apr 2022 05:28:28 GMT" response.header.server=AmazonS3 response.header.x-amz-id-2="NU0J2CLLlW4jq5J1tarRPv4iBDw1CfMMl6FlbCRq75SQz3uL/OkXX4rJJO+XDfQEIjB4SASaZaM=" response.header.x-amz-replication-status=COMPLETED response.header.x-amz-request-id=JKTNTDYNJ7QT3XT9 response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-version-id=XqQhNIlRTb3aOb8QsMDVQtwbiAg3buHp response.status="200 OK" size=29090 url="https://quay.io/v2/rhacs-eng/stackrox-operator-bundle/blobs/sha256:7e654a7e8519f043e19d3f91b5709b424dc06c0c7787ee8cfd4a0eb14ad63258"
DEBU[0000] unpacking layer: {application/vnd.docker.image.rootfs.diff.tar.gzip sha256:7e654a7e8519f043e19d3f91b5709b424dc06c0c7787ee8cfd4a0eb14ad63258 29090 [] map[] <nil>}  bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" dir=bundle-421604376
DEBU[0000] unpigz not found, falling back to go gzip     error="exec: \"unpigz\": executable file not found in $PATH"
DEBU[0000] unpacking layer: {application/vnd.docker.image.rootfs.diff.tar.gzip sha256:80583ed86a84ec80184a5bb7917a04b6d568b5ff1464f202986f7629b84c0097 406 [] map[] <nil>}  bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" dir=bundle-421604376
DEBU[0000] unpacking layer: {application/vnd.docker.image.rootfs.diff.tar.gzip sha256:82023726a6e890b5b0db920e4eea2841c8557f931994c36a39b1b5a4b76a9dd8 480 [] map[] <nil>}  bundle="quay.io/rhacs-eng/stackrox-operator-bundle:v3.69.0-nightly-20220404" dir=bundle-421604376
Error: error running tests context deadline exceeded
Usage:
  operator-sdk scorecard [flags]

Flags:
  -c, --config string            path to scorecard config file
  -h, --help                     help for scorecard
      --kubeconfig string        kubeconfig path
  -L, --list                     Option to enable listing which tests are run
  -n, --namespace string         namespace to run the test images in
  -o, --output string            Output format for results. Valid values: text, json, xunit (default "text")
  -l, --selector string          label selector to determine which tests are run
  -s, --service-account string   Service account to use for tests (default "default")
  -x, --skip-cleanup             Disable resource cleanup after tests are run
  -b, --storage-image string     Storage image to be used by the Scorecard pod (default "docker.io/library/busybox@sha256:c71cb4f7e8ececaffb34037c2637dc86820e4185100e18b4d02d613a9bd772af")
  -t, --test-output string       Test output directory. (default "test-output")
  -u, --untar-image string       Untar image to be used by the Scorecard pod (default "registry.access.redhat.com/ubi8@sha256:910f6bc0b5ae9b555eb91b88d28d568099b060088616eba2867b07ab6ea457c7")
  -w, --wait-time duration       seconds to wait for tests to complete. Example: 35s (default 30s)

Global Flags:
      --plugins strings   plugin keys to be used for this subcommand execution
      --verbose           Enable verbose logging

FATA[0040] error running tests context deadline exceeded 
make: *** [Makefile:401: bundle-test-image] Error 1

porridge avatar Apr 06 '22 05:04 porridge

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Jul 05 '22 07:07 openshift-bot

/remove-lifecycle stale

tkrishtop avatar Jul 05 '22 12:07 tkrishtop

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Oct 04 '22 01:10 openshift-bot

/remove-lifecycle stale

On Tue, 4 Oct 2022 at 03:01, OpenShift Bot @.***> wrote:

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

— Reply to this email directly, view it on GitHub https://github.com/operator-framework/operator-sdk/issues/5452#issuecomment-1266270441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADXPTAQ6KM5PFD6GPDGGULWBN6V7ANCNFSM5J5U4XVQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

porridge avatar Oct 04 '22 05:10 porridge

Hello team, that's an old issue but we keep seeing it regularly while running Preflight which currently runs operator-sdk 1.22.2.

Here are the logs from preflight.log (BasicSpecCheck.json is not created at all):

time="2022-11-21T02:42:04Z" level=debug msg="running check: ScorecardBasicSpecCheck"
time="2022-11-21T02:42:04Z" level=trace msg="Running operator-sdk scorecard check for quay.io/rh-nfv-int/testpmd-operator-bundle@sha256:5e28f883faacefa847104ebba1a1a22ee897b7576f0af6b8253c68b5c8f42815"
time="2022-11-21T02:42:04Z" level=trace msg="--selector=[test=basic-check-spec-test]"
time="2022-11-21T02:42:04Z" level=debug msg="Using quay.io/operator-framework/scorecard-test@sha256:b939bdf25de65d0c4256ac799b02a58335b705e24cbdd66d61e344982eb6156d as the scorecard test image"
time="2022-11-21T02:42:04Z" level=trace msg="running scorecard with the following invocation[operator-sdk scorecard --output json --selector=test=basic-check-spec-test --kubeconfig /kubeconfig --wait-time 240s --namespace preflight-testing --service-account default --config /tmp/scorecard-test-config-4476853.yaml --verbose /tmp/preflight-3276679150/fs]"
time="2022-11-21T02:46:04Z" level=debug msg="operator-sdk scorecard failed to run properly."
time="2022-11-21T02:46:04Z" level=debug msg="stderr: time=\"2022-11-21T02:42:04Z
\" level=debug msg=\"Debug logging is set
\"\nError: error running tests context deadline exceeded
\nUsage:\n  operator-sdk scorecard [flags]\n\n
Flags:\n  -c, --config string            path to scorecard config file\n  -h, --help                     help for scorecard\n      --kubeconfig string        kubeconfig path\n  -L, --list                     
Option to enable listing which tests are run\n  -n, --namespace string         namespace to run the test images in\n  -o, --output string            
Output format for results. Valid values: text, json, xunit (default \"text\")\n  -l, --selector string          label selector to determine which tests are run\n  -s, --service-account string   Service account to use for tests (default \"default\")\n  -x, --skip-cleanup             
Disable resource cleanup after tests are run\n  -b, --storage-image string     Storage image to be used by the Scorecard pod 
(default \"docker.io/library/busybox@sha256:c71cb4f7e8ececaffb34037c2637dc86820e4185100e18b4d02d613a9bd772af\")\n  -t, --test-output string       
Test output directory. (default \"test-output\")\n  -u, --untar-image string      
 Untar image to be used by the Scorecard pod (default \"registry.access.redhat.com/ubi8@sha256:910f6bc0b5ae9b555eb91b88d28d568099b060088616eba2867b07ab6ea457c7\")\n  -w, --wait-time duration       seconds to wait for tests to complete. 
Example: 35s (default 30s)\n\n
Global Flags:\n      --plugins strings   plugin keys to be used for this subcommand execution\n      --verbose           Enable verbose logging\n\ntime=\"2022-11-21T02:46:04Z\" level=fatal msg=\"error running tests context deadline exceeded\"\n"
time="2022-11-21T02:46:04Z" level=info msg="check completed: ScorecardBasicSpecCheck" err="failed to run operator-sdk scorecard: exit status 1" result=ERROR

Now we have more logs (thank you team!) and see that the timeout happens here: Untar image to be used by the Scorecard pod https://github.com/operator-framework/operator-sdk/blob/5d541d01b5d9d173a6927faf038a781f845b1003/internal/cmd/operator-sdk/scorecard/cmd.go#L96-L98

@asmacdo @theishshah do you have any clues?

cc: @acornett21

tkrishtop avatar Nov 22 '22 08:11 tkrishtop

Upd. In half of such situations, the scorecard-test pod is stuck in the namespace, failing to terminate. This last bad thing should be easy to fix, whatever the reason for the initial issue.

oc get all -n preflight-testing

NAME                      READY   STATUS        RESTARTS   AGE
pod/scorecard-test-tjbf   0/1     Terminating   0          14m

tkrishtop avatar Nov 22 '22 10:11 tkrishtop

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Feb 21 '23 01:02 openshift-bot

/remove-lifecycle stale

tkrishtop avatar Feb 21 '23 08:02 tkrishtop

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Mar 23 '23 08:03 openshift-bot

/remove-lifecycle stale

porridge avatar Mar 30 '23 07:03 porridge

/remove-lifecycle rotten

tkrishtop avatar Mar 30 '23 07:03 tkrishtop

Hi @porridge, I noticed that you are using a very old version 1.14.0. On our side, migration to 1.26.0 significantly reduced the frequency of timeouts.

tkrishtop avatar Mar 30 '23 07:03 tkrishtop

@tkrishtop we used 1.14 a year ago, when I wrote that comment :-) These days we just retry the command 4 times and we've never seen the error since. I guess the cause was long image pulls, but it would be great to confirm this by actually having a useful error message.

porridge avatar Mar 30 '23 07:03 porridge

@porridge I managed to trace down the error. The timeout happens here: Untar image to be used by the Scorecard pod, here is the source code:

https://github.com/operator-framework/operator-sdk/blob/5d541d01b5d9d173a6927faf038a781f845b1003/internal/cmd/operator-sdk/scorecard/cmd.go#L96-L98

cc: @theishshah

tkrishtop avatar Mar 30 '23 07:03 tkrishtop

@tkrishtop Recently operator-sdk switched to using their own images for this cmd. So instead of using an image from docker.io and an image from registry.access.redhat.com both images come from quay. This change is in 1.28.0 you can see the PR here. Have you upgraded to 1.28.0 yet?

acornett21 avatar Mar 30 '23 16:03 acornett21

Hi @acornett21, thank you for the information! Typically, we run the scorecard-sdk from preflight, which currently uses version 1.26.0. However, I do have the option to run the standalone scorecard-sdk, so perhaps I'll activate it for our daily runs using version 1.28.0.

tkrishtop avatar Mar 30 '23 17:03 tkrishtop

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Jun 29 '23 01:06 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot avatar Jul 29 '23 08:07 openshift-bot

/remove-lifecycle rotten

porridge avatar Aug 01 '23 05:08 porridge

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Oct 30 '23 09:10 openshift-bot

/remove-lifecycle stale

porridge avatar Oct 30 '23 09:10 porridge

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Jan 29 '24 01:01 openshift-bot

/remove-lifecycle stale

porridge avatar Jan 29 '24 05:01 porridge

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot avatar Apr 28 '24 09:04 openshift-bot

/remove-lifecycle stale

porridge avatar Apr 29 '24 05:04 porridge