operator-lifecycle-manager
operator-lifecycle-manager copied to clipboard
We should investigate why #3093 is necessary
Bug Report
The Image Update test uploads a couple of catalog images to an internal image registry which are then used in the test. Recently, the Image Update test began failing because of an authentication issue against the internal registry. In the past, the catalogSource pod would experience an authentication issue but eventually succeed; today the authentication issue never resolves. Some notes:
- Prior to introducing the changes in #3093, I noticed that the test would pass if you manually deleted the pod after the image pull error.
- We believe that the change in behavior might be a biproduct of this commit.
Here's an example of the failing pod yaml:
apiVersion: v1
kind: Pod
metadata:
labels:
olm.catalogSource: catalog-v4gtd
name: catalog-v4gtd-gwrx8
namespace: openshift-catsrc-e2e-9lcpt
ownerReferences:
- apiVersion: operators.coreos.com/v1alpha1
blockOwnerDeletion: false
controller: true
kind: CatalogSource
name: catalog-v4gtd
uid: 78b75952-f02e-4866-bb40-c5e9934fa70a
resourceVersion: "48814"
uid: b5c8ede2-b534-42b0-919e-2776ce5e045d
spec:
containers:
- image: image-registry.openshift-image-registry.svc:5000/openshift-catsrc-e2e-9lcpt/catsrc-update:xhgmp7
imagePullPolicy: Always
livenessProbe:
exec:
command:
- grpc_health_probe
- -addr=:50051
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: registry-server
ports:
- containerPort: 50051
name: grpc
protocol: TCP
readinessProbe:
exec:
command:
- grpc_health_probe
- -addr=:50051
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
requests:
cpu: 10m
memory: 50Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false
runAsNonRoot: true
runAsUser: 1000690000
startupProbe:
exec:
command:
- grpc_health_probe
- -addr=:50051
failureThreshold: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-7rbbq
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: ip-10-0-67-131.ec2.internal
nodeSelector:
kubernetes.io/os: linux
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000690000
seLinuxOptions:
level: s0:c26,c20
seccompProfile:
type: RuntimeDefault
serviceAccount: catalog-v4gtd
serviceAccountName: catalog-v4gtd
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
volumes:
- name: kube-api-access-7rbbq
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
- configMap:
items:
- key: service-ca.crt
path: service-ca.crt
name: openshift-service-ca.crt
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-11-06T21:31:43Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-11-06T21:31:43Z"
message: 'containers with unready status: [registry-server]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-11-06T21:31:43Z"
message: 'containers with unready status: [registry-server]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-11-06T21:31:43Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: image-registry.openshift-image-registry.svc:5000/openshift-catsrc-e2e-9lcpt/catsrc-update:xhgmp7
imageID: ""
lastState: {}
name: registry-server
ready: false
restartCount: 0
started: false
state:
waiting:
message: Back-off pulling image "image-registry.openshift-image-registry.svc:5000/openshift-catsrc-e2e-9lcpt/catsrc-update:xhgmp7"
reason: ImagePullBackOff
hostIP: 10.0.67.131
phase: Pending
podIP: 10.128.2.17
podIPs:
- ip: 10.128.2.17
qosClass: Burstable
startTime: "2023-11-06T21:31:43Z"
This ticket can be closed once we identify why the pod isn't able to pull from the internal registry.