operator-lifecycle-manager
operator-lifecycle-manager copied to clipboard
Packageserver cant connect to the grpc server
Issue
The olm Packageserver cannot connect to the grpc server created from an image built using operator-registry
The following CatalogSource has been deployed successfully on kubernetes 1.15
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: prometheus-manifests
spec:
displayName: Prometheus Operator
publisher: Snowdrop
sourceType: grpc
image: quay.io/cmoulliard/olm-index:0.1.0
but no packagemanifests are created within the namespace demo
When I look to the packagerserver running as a pod within the olm namespace, I see this error
W0212 20:57:16.234189 1 clientconn.go:1120] grpc:
addrConn.createTransport failed to connect to
{prometheus-manifests.demo.svc:50051 0 <nil>}.
Err :connection error:
desc = "transport: Error while dialing dial tcp 10.109.228.141:50051:
i/o timeout". Reconnecting...
I0212 20:57:16.234271 1 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc000029dd0, TRANSIENT_FAILURE
I0212 20:57:17.239568 1 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc000029dd0, CONNECTING
A service resource has been well created to access it
kind: Service
apiVersion: v1
metadata:
name: prometheus-manifests
namespace: demo
ownerReferences:
- apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: prometheus-manifests
uid: 0b863564-d8c0-471a-b3c7-63c1c1433153
controller: false
blockOwnerDeletion: false
spec:
ports:
- name: grpc
protocol: TCP
port: 50051
targetPort: 50051
selector:
olm.catalogSource: prometheus-manifests
clusterIP: 10.107.184.132
type: ClusterIP
sessionAffinity: None
Here is the pod resource created for the grpc server
kind: Pod
apiVersion: v1
metadata:
name: prometheus-manifests-kdnmf
generateName: prometheus-manifests-
namespace: demo
selfLink: /api/v1/namespaces/demo/pods/prometheus-manifests-kdnmf
uid: a57ef224-9729-44c0-a591-c885bb1695e7
resourceVersion: '15873'
creationTimestamp: '2020-02-12T20:56:56Z'
labels:
olm.catalogSource: prometheus-manifests
ownerReferences:
- apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
name: prometheus-manifests
uid: 0b863564-d8c0-471a-b3c7-63c1c1433153
controller: false
blockOwnerDeletion: false
spec:
volumes:
- name: default-token-rzx22
secret:
secretName: default-token-rzx22
defaultMode: 420
containers:
- name: registry-server
image: 'quay.io/cmoulliard/olm-index:0.1.0'
ports:
- name: grpc
containerPort: 50051
protocol: TCP
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 10m
memory: 50Mi
volumeMounts:
- name: default-token-rzx22
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
livenessProbe:
exec:
command:
- grpc_health_probe
- '-addr=localhost:50051'
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
readinessProbe:
exec:
command:
- grpc_health_probe
- '-addr=localhost:50051'
initialDelaySeconds: 5
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
nodeSelector:
beta.kubernetes.io/os: linux
serviceAccountName: default
serviceAccount: default
nodeName: k8s-115
securityContext: {}
schedulerName: default-scheduler
tolerations:
- operator: Exists
priority: 0
enableServiceLinks: true
status:
phase: Running
If I ssh to the vm running the cluster, I can use the grpcurl tool
[root@k8s-115 ~]# grpcurl -plaintext 10.107.184.132:50051 list api.Registry
api.Registry.GetBundle
api.Registry.GetBundleForChannel
api.Registry.GetBundleThatReplaces
api.Registry.GetChannelEntriesThatProvide
api.Registry.GetChannelEntriesThatReplace
api.Registry.GetDefaultBundleThatProvides
api.Registry.GetLatestChannelEntriesThatProvide
api.Registry.GetPackage
api.Registry.ListPackages
but listPackages is empty
grpcurl -plaintext 10.107.184.132:50051 api.Registry.ListPackages
[root@k8s-115 ~]#
Additional info
kubernetes cluster: 1.15 olm version: 0.14.1 image index : quay.io/cmoulliard/olm-index:0.1.0 operator-registry: master
Just to try to summarize:
You have a catalog being deployed into a namespace different than the namespace that the PackageServer is deployed into. The PackageServer is resolving the connection to the pod name prometheus-manifests.demo.svc at IP 10.109.228.141, the service though is setup on IP 10.107.184.132 and works just fine for hitting it with grpcurl.
So it appears PackageServer is trying to get to the pod directly rather than through the pod. If it was running in the same namespace as is default, that would work.
Does the Catalog Operator correctly access the catalog?
I wonder if you could actually setup the PackageServer in the same namespace and if that wouldn't solve your problem. I don't know if that would be an official solution, but might be a work-around.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
seeing the same issue on the OKD4 cluster I just installed today following https://medium.com/@craig_robinson/openshift-4-4-okd-bare-metal-install-on-vmware-home-lab-6841ce2d37eb
W0504 17:39:03.726845 1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {community-operators.openshift-marketplace.svc:50051 0
noticed this in case it helps
$ kubectl -n openshift-marketplace get event ... 14m Warning Unhealthy pod/community-operators-5b7f9bb9bf-b2v9v Readiness probe failed: timeout: failed to connect service "localhost:50051" within 1s 105s Warning Unhealthy pod/community-operators-5b7f9bb9bf-b2v9v Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s 15m Warning Unhealthy pod/community-operators-5b7f9bb9bf-b2v9v Readiness probe failed: command timed out 14m Warning Unhealthy pod/community-operators-5b7f9bb9bf-b2v9v Liveness probe failed: command timed out ...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
seeing the same issues with the readyness and liveness probes failing to contact localhost for the pods
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
any update on above issue Im facing the same issue in Mac m1 with go-grpc consul setup
+1 here. Added the operatorhub.io CatalogSource in openshift-marketplace ns inside an OCP cluster v4.18.11
Startup probe failed: timeout: failed to connect to service ":50051" within 1s The service name is missing, is there a specific name to use?
Thank you for your help