packages.operators apiregistration fails to authenticate to packageserver endpoint.
Hi,
After installing OLM (either with operator-sdk or install.sh), packageserver returns connect: connection refused while connecting to operatorhubio-catalog while I don't see any issue using a grpc_cli debugging container.
This is a very simple singlenode install of kubernetes with all pods patched on a same bridge.
$ kubectl version
Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.0
The clusterserviceversions stays in Installing phase.
$ kubectl get csv packageserver -n olm
NAME DISPLAY VERSION REPLACES PHASE
packageserver Package Server 0.26.0 Installing
$ k get apiservices v1.packages.operators.coreos.com -o yaml
[...]
conditions:
- lastTransitionTime: "2023-12-19T22:40:59Z"
message: 'failing or missing response from https://10.32.0.29:5443/apis/packages.operators.coreos.com/v1:
bad status from https://10.32.0.29:5443/apis/packages.operators.coreos.com/v1:
403'
reason: FailedDiscoveryCheck
status: "False"
type: Available
From a grpci_cli debuging container I can reach and list services of the operatorhubio-catalog.olm.svc endpoint.
$ kubectl run -it --rm --restart=Never --image=webplates/grpc-cli:latest grpccli ls operatorhubio-catalog.olm.svc.cluster.local:50051 api.Registry
ListPackages
GetPackage
GetBundle
GetBundleForChannel
GetChannelEntriesThatReplace
GetBundleThatReplaces
GetChannelEntriesThatProvide
GetLatestChannelEntriesThatProvide
GetDefaultBundleThatProvides
ListBundles
Within the operatorhubio-catalog pod the served configs seems ok.
<<K9s-Shell>> Pod: olm/operatorhubio-catalog-r52b7 | Container: registry-server
/ $ ps
PID USER TIME COMMAND
1 1001 0:35 /bin/opm serve /configs --cache-dir=/tmp/cache
1922 1001 0:00 sh
1942 1001 0:00 ps
/ $ grpc_health_probe -addr 127.0.0.1:50051
status: SERVING
/ $ /bin/opm validate /configs
/ $ /bin/opm version
Version: version.Version{OpmVersion:"v1.33.0", GitCommit:"5e23ef59", BuildDate:"2023-11-28T15:00:47Z", GoOs:"linux", GoArch:"amd64"}
/ $
All containers appears as running and livenessprobes seems to have been satisfied.
$ k get all -n olm
Warning: kubevirt.io/v1 VirtualMachineInstancePresets is now deprecated and will be removed in v2.
NAME READY STATUS RESTARTS AGE
pod/0b9f8e8106e6bc92a5b3edb6791ceaab0e8a22f5493895798082899af768bmj 0/1 Completed 0 12h
pod/9b9d47c94b554c8bd984f185a7385db635c1dbd74e304e2f4d34960f8bdvm5j 0/1 Completed 0 12h
pod/catalog-operator-7676fc5cc8-jr6th 1/1 Running 0 13h
pod/olm-operator-7c897bd449-jgnlk 1/1 Running 0 13h
pod/operatorhubio-catalog-r52b7 1/1 Running 0 13h
pod/packageserver-5966d674f8-fmjsn 1/1 Running 0 13h
pod/packageserver-5966d674f8-hwpxn 1/1 Running 0 13h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/operatorhubio-catalog ClusterIP 10.32.0.101 <none> 50051/TCP 13h
service/packageserver-service ClusterIP 10.32.0.138 <none> 5443/TCP 51s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/catalog-operator 1/1 1 1 13h
deployment.apps/olm-operator 1/1 1 1 13h
deployment.apps/packageserver 2/2 2 2 13h
NAME DESIRED CURRENT READY AGE
replicaset.apps/catalog-operator-7676fc5cc8 1 1 1 13h
replicaset.apps/olm-operator-7c897bd449 1 1 1 13h
replicaset.apps/packageserver-5966d674f8 2 2 2 13h
NAME COMPLETIONS DURATION AGE
job.batch/0b9f8e8106e6bc92a5b3edb6791ceaab0e8a22f5493895798082899af72da17 1/1 9s 12h
job.batch/9b9d47c94b554c8bd984f185a7385db635c1dbd74e304e2f4d34960f8bdc287 1/1 7s 12h
But a log from a packageserver pod returns:
time="2023-12-20T11:26:48Z" level=warning msg="error getting bundle stream" action="refresh cache" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 10.32.0.101:50051: connect: connection refused\"" source="{operatorhubio-catalog olm}"
W1220 11:26:49.978844 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {operatorhubio-catalog.olm.svc:50051 operatorhubio-catalog.olm.svc:50051 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.32.0.101:50051: connect: connection refused". Reconnecting...
I included what felt relevant from the olm-operator operatorhubio-catalog and packageserver logs.
catalog-operator.log operatorhubio-catalog.log packageserver.log olm-operator.log
update:
The connection refused logs from the packageserver pod are only happening during the instantiation of opm and package-server can connect correctly using grpc afterward.
Actual issue appears to concern the packageserver endpoint authentication as healthz livez and readyz endpoints all returns 200 ok but the apis/packages.operators.coreos.com/v1 endpoint returns 403 Forbidden.
message: 'failing or missing response from https://10.32.0.210:5443/apis/packages.operators.coreos.com/v1:
bad status from https://10.32.0.210:5443/apis/packages.operators.coreos.com/v1:
403'
If I run another package-server with --authorization-always-allow-paths /apis/packages.operators.coreos.com/v1 the endpoint is returning the expect result.
/bin/package-server -v=4 --secure-port 5444 --global-namespace olm --debug --authorization-always-allow-paths /apis/packages.operators.coreos.com/v1
dnstools# curl -k https://10.200.0.94:5444/apis/packages.operators.coreos.com/v1
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "packages.operators.coreos.com/v1",
"resources": [
{
"name": "packagemanifests",
"singularName": "packagemanifest",
"namespaced": true,
"kind": "PackageManifest",
"verbs": [
"get",
"list"
]
},
{
"name": "packagemanifests/icon",
"singularName": "",
"namespaced": true,
"kind": "PackageManifest",
"verbs": [
"get"
]
}
]
https://github.com/openshift/library-go/blob/7a65fdb398e28782ee1650959a5e0419121e97ae/pkg/config/serving/server.go#L63
refers to system:masters which matches the certificate I use to create OLM ressources.
What component/configuration may I be missing in my kubernetes deployment ?