operator-registry icon indicating copy to clipboard operation
operator-registry copied to clipboard

feat(opm): merge objects of same unique key in `alpha diff`

Open estroz opened this issue 4 years ago • 12 comments

Description of the change: opm alpha diff now merges packages, channels, and bundles that have the same unique key with ascending preference.

Motivation for the change: Deduplicate data that may cause errors at runtime

Test this change by building opm then comparing the output of master and this branch's opm alpha diff on some test catalogs I built:

$ ./bin/opm-master alpha diff quay.io/estroz/test-catalog:latest,quay.io/estroz/test-catalog:disjoint -o yaml > nomerge.yaml
...
$ ./bin/opm-merge alpha diff quay.io/estroz/test-catalog:latest,quay.io/estroz/test-catalog:disjoint -o yaml > merge.yaml
FATA[0005] error generating diff: error converting new declarative config to model: package "bar", bundle "bar.v0.2.0" not found in any channel entries

Reviewer Checklist

  • [ ] Implementation matches the proposed design, or proposal is updated to match implementation
  • [ ] Sufficient unit test coverage
  • [ ] Sufficient end-to-end test coverage
  • [ ] Docs updated or added to /docs
  • [ ] Commit messages sensible and descriptive

estroz avatar Nov 05 '21 21:11 estroz

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci[bot] avatar Nov 05 '21 21:11 openshift-ci[bot]

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: estroz To complete the pull request process, please assign joelanford after the PR has been reviewed. You can assign the PR to them by writing /assign @joelanford in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Nov 05 '21 21:11 openshift-ci[bot]

Codecov Report

Merging #823 (c5bd41e) into master (384af6b) will increase coverage by 0.42%. The diff coverage is 69.23%.

:exclamation: Current head c5bd41e differs from pull request most recent head 1352797. Consider uploading reports for the commit 1352797 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master     #823      +/-   ##
==========================================
+ Coverage   52.07%   52.50%   +0.42%     
==========================================
  Files         103      104       +1     
  Lines        9092     9220     +128     
==========================================
+ Hits         4735     4841     +106     
- Misses       3449     3463      +14     
- Partials      908      916       +8     
Impacted Files Coverage Δ
pkg/lib/indexer/indexer.go 10.61% <0.00%> (-0.18%) :arrow_down:
pkg/lib/registry/registry.go 18.39% <0.00%> (-0.36%) :arrow_down:
alpha/action/diff.go 73.13% <33.33%> (-3.92%) :arrow_down:
alpha/declcfg/merge.go 83.01% <83.01%> (ø)
alpha/declcfg/diff.go 77.96% <100.00%> (+1.04%) :arrow_up:
pkg/lib/bundle/exporter.go 62.85% <100.00%> (+6.19%) :arrow_up:
alpha/declcfg/diff_include.go 67.92% <0.00%> (+6.60%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 384af6b...1352797. Read the comment docs.

codecov[bot] avatar Nov 05 '21 21:11 codecov[bot]

I'm still concerned that this puts render in the opinionated camp. And once something has opinions, its hard to say no to making that thing have variations of those opinions. My concern is that as soon as we say "opm render can merge things", we'll get all sorts of requests about how to merge those things:

  • Can we prefer values from references listed earlier instead of later?
  • Can we "intelligently" merge channel entries? (with variations on what's intelligent for various use cases)
  • Can we "intelligently" merge bundle properties? (with variations on what's intelligent for various use cases)
  • Can we override some values but merge others?
  • etc, etc.

If this is a specific need of the diff command, can we hardcode the correct merge logic directly into the diff command's logic?

joelanford avatar Nov 08 '21 15:11 joelanford

All valid concerns. I do actually think, in the long term, opm or some library it consumes should have the ability to support the different merge strategies you suggest (and more), while being opinionated by default. I also want to point out that you can effectively get different merge strategies by preprocessing opm render --merge input refs yourself, ex. reversing the input ref list(s) to reverse merge preference, or breaking out packages/channels/bundles into separate refs and reordering them with jq. Ultimately I'd like the merge library opm render uses to be extensible to several merge strategies, but opm render itself has only one or two opinionated ways to merge 2+ refs. opm render --merge may be used in the future to handle the multiple-catalog-shared-content issue, which is a real concern when one cluster uses multiple curated but uncoordinated content sources, so it would be better to have a tool for cluster admins that isn't perfect than no tool at all to handle this issue. Perhaps this warrants an EP in itself.

If this is a specific need of the diff command, can we hardcode the correct merge logic directly into the diff command's logic?

For now this is totally fine with me.

estroz avatar Nov 08 '21 19:11 estroz

Is the result of this consistent with how opm serve currently merges the stream? The result should be identical assuming there is no advanced/intelligent merge policy.

cdjohnson avatar Nov 08 '21 20:11 cdjohnson

Is the result of this consistent with how opm serve currently merges the stream? The result should be identical assuming there is no advanced/intelligent merge policy.

opm serve doesn't merge anything. It loads it all and then errors out (via an internal opm validate call) if it finds any duplicates.

joelanford avatar Nov 08 '21 22:11 joelanford

@joelanford

opm serve doesn't merge anything. It loads it all and then errors out (via an internal opm validate call) if it finds any duplicates.

It appears to do some sort of merging or deduplication (I didn't do an exhaustive test). Take a look at this FBC: docker.io/cdjohnson/depcatalogs@sha256:e067cbe0738608c19f66b71828d8dfa75a25790178316b73585cc7dc2967e635

# See that testoperatorc is duplicated
cat index/index.yaml | yq -r 'select(.schema == "olm.package").name'
testoperatora
testoperatorb
testoperatorc
testoperatorc

cat index/index.yaml | yq -r 'select(.schema == "olm.bundle").name'
testoperatora.v1.0.0
testoperatorb.v2.0.0
testoperatorc.v3.0.0
testoperatorc.v3.0.0
testoperatorc.v3.0.1

opm validate passes

When I serve it, it ignores (or merges) the duplicates: opm serve index &

grpcurl -plaintext localhost:50051 api.Registry/ListBundles | jq '{name: .csvName, version: .version, replaces: .replaces, skipRange: .skipRange, skips: .skips, channelName: .channelName}' | jq -s -c 'sort_by(.name)[]'
{"name":"testoperatora.v1.0.0","version":"1.0.0","replaces":null,"skipRange":null,"skips":null,"channelName":"v1.0"}
{"name":"testoperatorb.v2.0.0","version":"2.0.0","replaces":null,"skipRange":null,"skips":null,"channelName":"v2.0"}
{"name":"testoperatorc.v3.0.0","version":"3.0.0","replaces":null,"skipRange":null,"skips":null,"channelName":"v3.0"}
{"name":"testoperatorc.v3.0.1","version":"3.0.1","replaces":"testoperatorc.v3.0.0","skipRange":"<3.0.1","skips":null,"channelName":"v3.0"}

cdjohnson avatar Nov 09 '21 14:11 cdjohnson

It appears to do some sort of merging or deduplication

Looking at the code, I think you're right! But I think this is a regression. At one point, I'm fairly sure opm validate was producing and error if it found multiple bundles with the same name in the same package.

joelanford avatar Nov 10 '21 14:11 joelanford

Now taking the time to look back through the history of the FBC implementation, I didn't find any code that actually prevented duplicates. It looks like the code has always just accepted the last duplicate item (i.e. no merging, just throw away the old thing and keep the new thing when I encounter a duplicate)

I still consider this an error though because a declarative API (which FBC is), should disallow such ambiguity. Declaring "I want A to look like X AND I want A to look like Y" just doesn't make sense.

PR to fix this is #824.

joelanford avatar Nov 10 '21 19:11 joelanford

/cc @joelanford

estroz avatar Dec 16 '21 01:12 estroz

@estroz: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci[bot] avatar Feb 26 '22 22:02 openshift-ci[bot]

opm diff has moved to oc mirror. From the conversation it appears that team has already extracted the salient bits of this PR for that project. Is there any aspect of this PR which hasn't been moved over and is still needed?

grokspawn avatar Oct 19 '22 18:10 grokspawn