operator-registry icon indicating copy to clipboard operation
operator-registry copied to clipboard

Newer versions of opm serve have truncated data returned via the grpc GetBundleForChannel api?

Open joehuizenga opened this issue 2 years ago • 3 comments

In prior versions of opm the GetBundleForChannel returned a lot more data than it does today, our testing pipeline relies on this data, particularly the bundlePath element.   Example:   Older catalog podman run -d --name catalog -p 50051:50051 "icr.io/cpopen/datapower-operator-catalog@sha256:2b29c318c29dd11b9a8075e64268ba94a93a7d9917cbf5f5f741eb9decbe3bf8" grpcurl -plaintext -d '{"pkgName": "datapower-operator", "channelName": "v1.4"}' localhost:50051 api.Registry.GetBundleForChannel | jq -r .bundlePath

returns the bundle image as expected icr.io/cpopen/datapower-operator-bundle@sha256:025a17b6b198843ab8c5bb4ab8490400ff181397c0b7c811fb3f59fb4ecf28d8   New catalog podman run -d --name catalog -p 50051:50051 "icr.io/cpopen/datapower-operator-catalog@sha256:6f2289e5d54acc62b10e1bd632a5ba45e8348c7f3b352996188e524fb2639b9b" grpcurl -plaintext -d '{"pkgName": "datapower-operator", "channelName": "v1.4"}' localhost:50051 api.Registry.GetBundleForChannel | jq -r .bundlePath

returns null?? null

joehuizenga avatar Mar 19 '23 14:03 joehuizenga

After a bit of investigation, it appears that we trimmed down the GetBundleForChannel response to resolve some on-cluster performance issues. See https://github.com/operator-framework/operator-registry/pull/769

That PR landed in 1.18.1. So the immediate fix is to revert to 1.18.0 if possible. Clearly that release was quite awhile ago, so there are probably reasons for that not being a great option.

One possibility is that we could add the bundlePath field back into the response.

Lastly, the FBC server does not exhibit this behavior on an FBC that is migrated from sqlite. Which begs two more questions:

  1. We have tests that verify that the GRPC responses from sqlite and FBC servers are identical with equivalent input. How are those tests passing?
  2. As users migrate to FBC, are clusters going to take the same performance hit that #769 already solved for sqlite-based catalogs?

joelanford avatar Mar 23 '23 21:03 joelanford

I think there are two answers for item (1):

  1. #769 updated the test for Sqlite, but not FBC. In retrospect, it isn't obvious looking at the test code that the expectation is that FBC and Sqlite responses should stay aligned. We may need to think about how to avoid this sort of drift in the future.
  2. It appears as if our testdata is loaded from the old packagemanifests format and therefore doesn't contain bundle image references. So all this time, the bundlePath field of the API bundle has been unset for our tests.

joelanford avatar Mar 24 '23 12:03 joelanford

Since this change was done way back in 1.18.0 and our usage of this api in our pipeline is more of an optimization, I think we can implement a workaround in our pipeline to locate the bundlePath (aka image), list all bundles, save as a map by csvName , use the GetBundleForChannel as is (which returns the csvName) and locate the bundle image using the map. As for the potential impact on performance from an FBC perspective, thinking that should be in another ticket

joehuizenga avatar Mar 27 '23 20:03 joehuizenga

Issues go stale after 90 days of inactivity. If there is no further activity, the issue will be closed in another 30 days.

github-actions[bot] avatar Jun 01 '25 01:06 github-actions[bot]

This issue has been closed due to inactivity.

github-actions[bot] avatar Jul 03 '25 01:07 github-actions[bot]