serving icon indicating copy to clipboard operation
serving copied to clipboard

Revision stays in ContainerMissing condition forever after a temporary failure of digest resolution

Open maschmid opened this issue 1 year ago • 5 comments

/area reconciler

What version of Knative?

1.14

Expected Behavior

After a temporary error in digest resolution causes a ContainerHealthy condition to be False due to ContainerMissing , when the digest resolution is eventually successful, the ContainerHealthy should be True.

Actual Behavior

After a temporary error in digest resolution, when the digest resolution is eventually successful, the Revision stays in this inconsistent broken state:

status:
  actualReplicas: 1
  conditions:
  - lastTransitionTime: "2024-08-12T22:30:16Z"
    severity: Info
    status: "True"
    type: Active
  - lastTransitionTime: "2024-08-12T22:28:04Z"
    message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
      failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
      unexpected status code 401 Unauthorized'
    reason: ContainerMissing
    status: "False"
    type: ContainerHealthy
  - lastTransitionTime: "2024-08-12T22:28:04Z"
    message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
      failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
      unexpected status code 401 Unauthorized'
    reason: ContainerMissing
    status: "False"
    type: Ready
  - lastTransitionTime: "2024-08-12T22:30:12Z"
    status: "True"
    type: ResourcesAvailable
  containerStatuses:
  - imageDigest: image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp@sha256:e915478407c5c882346c4fc72078007fd2511d9e1796345db1873facafddf836
    name: user-container
  desiredReplicas: 1
  observedGeneration: 1

Notice the containerStatuses showing the resolved image digest , the deployments are Ready (with ResourcesAvailable being True), but the ContainerHealthy still being False with the original digest resolution error.

Steps to Reproduce the Problem

Currently does not have a reproducer, noticed the problem on a long running test

maschmid avatar Aug 13 '24 12:08 maschmid

@maschmid: The label(s) area/reconciler cannot be applied, because the repository doesn't have them.

In response to this:

/area reconciler

What version of Knative?

1.14

Expected Behavior

After a temporary error in digest resolution causes a ContainerHealthy condition to be False due to ContainerMissing , when the digest resolution is eventually successful, the ContainerHealthy should be True.

Actual Behavior

After a temporary error in digest resolution, when the digest resolution is eventually successful, the Revision stays in this inconsistent broken state:

status:
 actualReplicas: 1
 conditions:
 - lastTransitionTime: "2024-08-12T22:30:16Z"
   severity: Info
   status: "True"
   type: Active
 - lastTransitionTime: "2024-08-12T22:28:04Z"
   message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
     failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
     unexpected status code 401 Unauthorized'
   reason: ContainerMissing
   status: "False"
   type: ContainerHealthy
 - lastTransitionTime: "2024-08-12T22:28:04Z"
   message: 'Unable to fetch image "image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp":
     failed to resolve image to digest: GET https://image-registry.openshift-image-registry.svc:5000/openshift/token?scope=repository%3Aocf-qe-images%2Freceiverhttp%3Apull&service=:
     unexpected status code 401 Unauthorized'
   reason: ContainerMissing
   status: "False"
   type: Ready
 - lastTransitionTime: "2024-08-12T22:30:12Z"
   status: "True"
   type: ResourcesAvailable
 containerStatuses:
 - imageDigest: image-registry.openshift-image-registry.svc:5000/ocf-qe-images/receiverhttp@sha256:e915478407c5c882346c4fc72078007fd2511d9e1796345db1873facafddf836
   name: user-container
 desiredReplicas: 1
 observedGeneration: 1

Notice the containerStatuses showing the resolved image digest , the deployments are Ready (with ResourcesAvailable being True), but the ContainerHealthy still being False with the original digest resolution error.

Steps to Reproduce the Problem

Currently does not have a reproducer, noticed the problem on a long running test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

knative-prow[bot] avatar Aug 13 '24 12:08 knative-prow[bot]

cc @dprotaso @skonto

ReToCode avatar Aug 14 '24 11:08 ReToCode

@dprotaso gentle ping I tried to reproduce locally but no luck.

skonto avatar Sep 05 '24 14:09 skonto

https://github.com/knative/serving/issues/15487 could be a similar issue.

maschmid avatar Sep 06 '24 12:09 maschmid

#15503 fixes this one too, correct @maschmid ?

skonto avatar Oct 07 '24 08:10 skonto

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Jan 06 '25 01:01 github-actions[bot]

/remove-lifecycle stale

skonto avatar Jan 31 '25 08:01 skonto

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar May 02 '25 01:05 github-actions[bot]