network-observability-operator icon indicating copy to clipboard operation
network-observability-operator copied to clipboard

NETOBSERV-1747: trigger stored resource migration on operator startup

Open jotak opened this issue 1 year ago • 8 comments

Description

Borrowing and adapting code from Tekton https://github.com/tektoncd/operator/blob/v0.72.0/pkg/reconciler/shared/tektonconfig/upgrade/helper/migrator.go

On startup, the operator will start the migration process, which is basically a noop if resources are already in the correct stored version, or else which triggers a conversion + rewrite by the api server (which itself relies on our conversion webhooks). The CRD status is then updated to only mention the current stored version.

Testing

Testing is not straightforward, here's the steps I did:

# Install an older CRD version where storedVersion was different (you don't need to install the operator, just the CRD is sufficient)
git checkout release-1.5
make install
oc apply -f config/samples/flows_v1beta1_flowcollector.yaml

# Check stored version is v1beta1
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1"]

# Update the CRD to the new version
git checkout main
bin/kustomize build config/crd | kubectl apply --server-side --force-conflicts -f -

# Check stored version have both v1beta1 and v1beta2
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1","v1beta2"]

# <-- AT THIS POINT, if nothing is done and later the v1beta1 version is removed, this would break the upgrades

# Deploy the operator
make deploy VERSION=... USER=...

oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta2"]

You may also take a look at the operator logs. It will probably show some temporary errors, due to the webhook server not being ready yet when trying migration - the operator is just going to retry with an exponential backoff:

2024-09-17T11:59:40.460Z	INFO	ensuring stored resources are up to date
2024-09-17T11:59:40.460Z	INFO	migrating group version	{"crdGroup": "flowcollectors.flows.netobserv.io"}
W0917 11:59:40.535415       1 reflector.go:561] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1beta2.FlowCollector: conversion webhook for flows.netobserv.io/v1beta1, Kind=FlowCollector failed: Post "https://netobserv-webhook-service.netobserv.svc:443/convert?timeout=30s": no endpoints available for service "netobserv-webhook-service"
...
2024-09-17T11:59:51.723Z	INFO	migration completed	{"crdGroup": "flowcollectors.flows.netobserv.io"}
2024-09-17T11:59:51.723Z	INFO	migrating group version	{"crdGroup": "flowmetrics.flows.netobserv.io"}
2024-09-17T11:59:51.807Z	INFO	migration completed	{"crdGroup": "flowmetrics.flows.netobserv.io"}

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • [x] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • [ ] Does this PR require product documentation?
    • [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • [x] Does this PR require a product release notes entry?
    • [ ] If so, fill in "Release Note Text" in the JIRA.
  • [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • [ ] If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • [x] Standard QE validation, with pre-merge tests unless stated otherwise.
    • [ ] Regression tests only (e.g. refactoring with no user-facing change).
    • [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

jotak avatar Sep 17 '24 12:09 jotak

@jotak: This pull request references NETOBSERV-1747 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.18.0" version, but no target version was set.

In response to this:

Description

Borrowing and adapting code from Tekton https://github.com/tektoncd/operator/blob/v0.72.0/pkg/reconciler/shared/tektonconfig/upgrade/helper/migrator.go

On startup, the operator will start the migration process, which is basically a noop if resources are already in the correct stored version, or else which triggers a conversion + rewrite by the api server (which itself relies on our conversion webhooks). The CRD status is then updated to only mention the current stored version.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • [x] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • [ ] Does this PR require product documentation?
  • [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • [x] Does this PR require a product release notes entry?
  • [ ] If so, fill in "Release Note Text" in the JIRA.
  • [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • [ ] If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • [x] Standard QE validation, with pre-merge tests unless stated otherwise.
  • [ ] Regression tests only (e.g. refactoring with no user-facing change).
  • [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Sep 17 '24 12:09 openshift-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from jotak. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Sep 17 '24 12:09 openshift-ci[bot]

cc @msherif1234 you might need to do something in the same vein in the operator(s) you maintain (ingress firewall, bpfman?)

jotak avatar Sep 17 '24 12:09 jotak

Codecov Report

Attention: Patch coverage is 84.61538% with 12 lines in your changes missing coverage. Please review.

Project coverage is 66.76%. Comparing base (efe1407) to head (e352caa). Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
pkg/migrator/migrator.go 86.11% 7 Missing and 3 partials :warning:
pkg/manager/manager.go 66.66% 1 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #780      +/-   ##
==========================================
+ Coverage   66.57%   66.76%   +0.19%     
==========================================
  Files          75       76       +1     
  Lines        8726     8804      +78     
==========================================
+ Hits         5809     5878      +69     
- Misses       2505     2511       +6     
- Partials      412      415       +3     
Flag Coverage Δ
unittests 66.76% <84.61%> (+0.19%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pkg/manager/manager.go 66.66% <66.66%> (ø)
pkg/migrator/migrator.go 86.11% <86.11%> (ø)

... and 3 files with indirect coverage changes

codecov[bot] avatar Sep 17 '24 13:09 codecov[bot]

@jotak: This pull request references NETOBSERV-1747 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.18.0" version, but no target version was set.

In response to this:

Description

Borrowing and adapting code from Tekton https://github.com/tektoncd/operator/blob/v0.72.0/pkg/reconciler/shared/tektonconfig/upgrade/helper/migrator.go

On startup, the operator will start the migration process, which is basically a noop if resources are already in the correct stored version, or else which triggers a conversion + rewrite by the api server (which itself relies on our conversion webhooks). The CRD status is then updated to only mention the current stored version.

Testing

Testing is not straightforward, here's the steps I did:

# Install an older CRD version where storedVersion was different (you don't need to install the operator, just the CRD is sufficient)
git checkout release-1.5
make install
oc apply -f config/samples/flows_v1beta1_flowcollector.yaml

# Check stored version is v1beta1
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1"]

# Update the CRD to the new version
git checkout main
bin/kustomize build config/crd | kubectl apply --server-side --force-conflicts -f -

# Check stored version have both v1beta1 and v1beta2
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1","v1beta2"]

# <-- AT THIS POINT, if nothing is done and later the v1beta1 version is removed, this would break the upgrades

# Deploy the operator
make deploy VERSION=... USER=...

oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta2"]

You may also take a look at the operator logs. It will probably show some temporary errors, due to the webhook server not being ready yet when trying migration - the operator is just going to retry with an exponential backoff:

2024-09-17T11:59:40.460Z	INFO	ensuring stored resources are up to date
2024-09-17T11:59:40.460Z	INFO	migrating group version	{"crdGroup": "flowcollectors.flows.netobserv.io"}
W0917 11:59:40.535415       1 reflector.go:561] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1beta2.FlowCollector: conversion webhook for flows.netobserv.io/v1beta1, Kind=FlowCollector failed: Post "https://netobserv-webhook-service.netobserv.svc:443/convert?timeout=30s": no endpoints available for service "netobserv-webhook-service"
...
2024-09-17T11:59:51.723Z	INFO	migration completed	{"crdGroup": "flowcollectors.flows.netobserv.io"}
2024-09-17T11:59:51.723Z	INFO	migrating group version	{"crdGroup": "flowmetrics.flows.netobserv.io"}
2024-09-17T11:59:51.807Z	INFO	migration completed	{"crdGroup": "flowmetrics.flows.netobserv.io"}

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • [x] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • [ ] Does this PR require product documentation?
  • [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • [x] Does this PR require a product release notes entry?
  • [ ] If so, fill in "Release Note Text" in the JIRA.
  • [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • [ ] If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • [x] Standard QE validation, with pre-merge tests unless stated otherwise.
  • [ ] Regression tests only (e.g. refactoring with no user-facing change).
  • [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Sep 17 '24 15:09 openshift-ci-robot

New changes are detected. LGTM label has been removed.

openshift-ci[bot] avatar Oct 03 '24 09:10 openshift-ci[bot]

LGTM, maybe just need some testing around the retry params (steps / duration ...) ?

done: e352caa

jotak avatar Oct 03 '24 09:10 jotak

/retest

jotak avatar Oct 17 '24 07:10 jotak

@jotak: This pull request references NETOBSERV-1747 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.18.0" version, but no target version was set.

In response to this:

Description

Borrowing and adapting code from Tekton https://github.com/tektoncd/operator/blob/v0.72.0/pkg/reconciler/shared/tektonconfig/upgrade/helper/migrator.go

On startup, the operator will start the migration process, which is basically a noop if resources are already in the correct stored version, or else which triggers a conversion + rewrite by the api server (which itself relies on our conversion webhooks). The CRD status is then updated to only mention the current stored version.

Testing

Testing is not straightforward, here's the steps I did:

# Install an older CRD version where storedVersion was different (you don't need to install the operator, just the CRD is sufficient)
git checkout release-1.5
make install
oc apply -f config/samples/flows_v1beta1_flowcollector.yaml

# Check stored version is v1beta1
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1"]

# Update the CRD to the new version
git checkout main
bin/kustomize build config/crd | kubectl apply --server-side --force-conflicts -f -

# Check stored version have both v1beta1 and v1beta2
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1","v1beta2"]

# <-- AT THIS POINT, if nothing is done and later the v1beta1 version is removed, this would break the upgrades

# Deploy the operator (this PR)
make deploy VERSION=... USER=...

oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta2"]

You may also take a look at the operator logs. It will probably show some temporary errors, due to the webhook server not being ready yet when trying migration - the operator is just going to retry with an exponential backoff:

2024-09-17T11:59:40.460Z	INFO	ensuring stored resources are up to date
2024-09-17T11:59:40.460Z	INFO	migrating group version	{"crdGroup": "flowcollectors.flows.netobserv.io"}
W0917 11:59:40.535415       1 reflector.go:561] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1beta2.FlowCollector: conversion webhook for flows.netobserv.io/v1beta1, Kind=FlowCollector failed: Post "https://netobserv-webhook-service.netobserv.svc:443/convert?timeout=30s": no endpoints available for service "netobserv-webhook-service"
...
2024-09-17T11:59:51.723Z	INFO	migration completed	{"crdGroup": "flowcollectors.flows.netobserv.io"}
2024-09-17T11:59:51.723Z	INFO	migrating group version	{"crdGroup": "flowmetrics.flows.netobserv.io"}
2024-09-17T11:59:51.807Z	INFO	migration completed	{"crdGroup": "flowmetrics.flows.netobserv.io"}

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • [x] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • [ ] Does this PR require product documentation?
  • [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • [x] Does this PR require a product release notes entry?
  • [ ] If so, fill in "Release Note Text" in the JIRA.
  • [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • [ ] If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • [x] Standard QE validation, with pre-merge tests unless stated otherwise.
  • [ ] Regression tests only (e.g. refactoring with no user-facing change).
  • [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Oct 17 '24 08:10 openshift-ci-robot

/ok-to-test

Amoghrd avatar Nov 06 '24 15:11 Amoghrd

New images:

  • quay.io/netobserv/network-observability-operator:2bd3884
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-2bd3884
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-2bd3884

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:2bd3884 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-2bd3884

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-2bd3884
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

github-actions[bot] avatar Nov 06 '24 15:11 github-actions[bot]

@jotak I tried the above steps but could not see the migration happening. I still see v1beta1 in the storedVersion list. I did capture the controller logs and have attached it below.

netobserv-controller-manager-7fd689f688-psz8z-manager.log

Amoghrd avatar Nov 12 '24 21:11 Amoghrd

@jotak: This pull request references NETOBSERV-1747 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.18.0" version, but no target version was set.

In response to this:

Description

Borrowing and adapting code from Tekton https://github.com/tektoncd/operator/blob/v0.72.0/pkg/reconciler/shared/tektonconfig/upgrade/helper/migrator.go

On startup, the operator will start the migration process, which is basically a noop if resources are already in the correct stored version, or else which triggers a conversion + rewrite by the api server (which itself relies on our conversion webhooks). The CRD status is then updated to only mention the current stored version.

Testing

Testing is not straightforward, here's the steps I did:

# Install an older CRD version where storedVersion was different (you don't need to install the operator, just the CRD is sufficient)
git checkout release-1.5
make install
oc apply -f config/samples/flows_v1beta1_flowcollector.yaml

# Check stored version is v1beta1
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1"]

# Update the CRD to the new version
git checkout main # Assuming your main branch is up-to-date, else please update it
bin/kustomize build config/crd | kubectl apply --server-side --force-conflicts -f -

# Check stored version have both v1beta1 and v1beta2
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1","v1beta2"]

# <-- AT THIS POINT, if nothing is done and later the v1beta1 version is removed, this would break the upgrades

# Deploy the operator (this PR)
make deploy VERSION=... USER=...

oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta2"]

You may also take a look at the operator logs. It will probably show some temporary errors, due to the webhook server not being ready yet when trying migration - the operator is just going to retry with an exponential backoff:

2024-09-17T11:59:40.460Z	INFO	ensuring stored resources are up to date
2024-09-17T11:59:40.460Z	INFO	migrating group version	{"crdGroup": "flowcollectors.flows.netobserv.io"}
W0917 11:59:40.535415       1 reflector.go:561] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1beta2.FlowCollector: conversion webhook for flows.netobserv.io/v1beta1, Kind=FlowCollector failed: Post "https://netobserv-webhook-service.netobserv.svc:443/convert?timeout=30s": no endpoints available for service "netobserv-webhook-service"
...
2024-09-17T11:59:51.723Z	INFO	migration completed	{"crdGroup": "flowcollectors.flows.netobserv.io"}
2024-09-17T11:59:51.723Z	INFO	migrating group version	{"crdGroup": "flowmetrics.flows.netobserv.io"}
2024-09-17T11:59:51.807Z	INFO	migration completed	{"crdGroup": "flowmetrics.flows.netobserv.io"}

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • [x] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • [ ] Does this PR require product documentation?
  • [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • [x] Does this PR require a product release notes entry?
  • [ ] If so, fill in "Release Note Text" in the JIRA.
  • [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • [ ] If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • [x] Standard QE validation, with pre-merge tests unless stated otherwise.
  • [ ] Regression tests only (e.g. refactoring with no user-facing change).
  • [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Nov 13 '24 10:11 openshift-ci-robot

New images:

  • quay.io/netobserv/network-observability-operator:eddd6d8
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-eddd6d8
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-eddd6d8

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:eddd6d8 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-eddd6d8

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-eddd6d8
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

github-actions[bot] avatar Nov 13 '24 10:11 github-actions[bot]

@jotak: This pull request references NETOBSERV-1747 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.18.0" version, but no target version was set.

In response to this:

Description

Borrowing and adapting code from Tekton https://github.com/tektoncd/operator/blob/v0.72.0/pkg/reconciler/shared/tektonconfig/upgrade/helper/migrator.go

On startup, the operator will start the migration process, which is basically a noop if resources are already in the correct stored version, or else which triggers a conversion + rewrite by the api server (which itself relies on our conversion webhooks). The CRD status is then updated to only mention the current stored version.

Testing

Testing is not straightforward, here's the steps I did:

# Install an older CRD version where storedVersion was different (you don't need to install the operator, just the CRD is sufficient)
git checkout release-1.5
make install
oc apply -f config/samples/flows_v1beta1_flowcollector.yaml

# Check stored version is v1beta1
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1"]

# Update the CRD to the new version
git checkout main # Assuming your main branch is up-to-date, else please update it
bin/kustomize build config/crd | kubectl apply --server-side --force-conflicts -f -

# Check stored version have both v1beta1 and v1beta2
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1","v1beta2"]

# <-- AT THIS POINT, if nothing is done and later the v1beta1 version is removed, this would break the upgrades

# Deploy the operator (this PR)
git checkout (this PR)
make deploy VERSION=... USER=...

oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta2"]

You may also take a look at the operator logs. It will probably show some temporary errors, due to the webhook server not being ready yet when trying migration - the operator is just going to retry with an exponential backoff:

2024-09-17T11:59:40.460Z	INFO	ensuring stored resources are up to date
2024-09-17T11:59:40.460Z	INFO	migrating group version	{"crdGroup": "flowcollectors.flows.netobserv.io"}
W0917 11:59:40.535415       1 reflector.go:561] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1beta2.FlowCollector: conversion webhook for flows.netobserv.io/v1beta1, Kind=FlowCollector failed: Post "https://netobserv-webhook-service.netobserv.svc:443/convert?timeout=30s": no endpoints available for service "netobserv-webhook-service"
...
2024-09-17T11:59:51.723Z	INFO	migration completed	{"crdGroup": "flowcollectors.flows.netobserv.io"}
2024-09-17T11:59:51.723Z	INFO	migrating group version	{"crdGroup": "flowmetrics.flows.netobserv.io"}
2024-09-17T11:59:51.807Z	INFO	migration completed	{"crdGroup": "flowmetrics.flows.netobserv.io"}

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • [x] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • [ ] Does this PR require product documentation?
  • [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • [x] Does this PR require a product release notes entry?
  • [ ] If so, fill in "Release Note Text" in the JIRA.
  • [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • [ ] If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • [x] Standard QE validation, with pre-merge tests unless stated otherwise.
  • [ ] Regression tests only (e.g. refactoring with no user-facing change).
  • [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Nov 13 '24 11:11 openshift-ci-robot

@jotak I tried the above steps but could not see the migration happening. I still see v1beta1 in the storedVersion list. I did capture the controller logs and have attached it below.

netobserv-controller-manager-7fd689f688-psz8z-manager.log

@Amoghrd OK there was 2 issues:

  • first this PR had to be rebased (it's 2 months old!) so deploying the operator based on main's CRD was generating errors
  • second, something was missing in my test steps, because this PR includes RBAC changes it needs to be deployed from the PR itself; so I added git checkout (this PR) in the test steps.

jotak avatar Nov 13 '24 11:11 jotak

@jotak: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-operator eeb6e46bfa1473726b56b8ef62c0987c6ae908f7 link false /test e2e-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Nov 13 '24 12:11 openshift-ci[bot]

That helped thanks Joel! /label qe-approved

Amoghrd avatar Nov 13 '24 15:11 Amoghrd

@jotak: This pull request references NETOBSERV-1747 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.18.0" version, but no target version was set.

In response to this:

Description

Borrowing and adapting code from Tekton https://github.com/tektoncd/operator/blob/v0.72.0/pkg/reconciler/shared/tektonconfig/upgrade/helper/migrator.go

On startup, the operator will start the migration process, which is basically a noop if resources are already in the correct stored version, or else which triggers a conversion + rewrite by the api server (which itself relies on our conversion webhooks). The CRD status is then updated to only mention the current stored version.

Testing

Testing is not straightforward, here's the steps I did:

# Install an older CRD version where storedVersion was different (you don't need to install the operator, just the CRD is sufficient)
git checkout release-1.5
make install
oc apply -f config/samples/flows_v1beta1_flowcollector.yaml

# Check stored version is v1beta1
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1"]

# Update the CRD to the new version
git checkout main # Assuming your main branch is up-to-date, else please update it
bin/kustomize build config/crd | kubectl apply --server-side --force-conflicts -f -

# Check stored version have both v1beta1 and v1beta2
oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta1","v1beta2"]

# <-- AT THIS POINT, if nothing is done and later the v1beta1 version is removed, this would break the upgrades

# Deploy the operator (this PR)
git checkout (this PR)
make deploy VERSION=... USER=...

oc get crd flowcollectors.flows.netobserv.io -ojsonpath='{.status.storedVersions}'
["v1beta2"]

You may also take a look at the operator logs. It will probably show some temporary errors, due to the webhook server not being ready yet when trying migration - the operator is just going to retry with an exponential backoff:

2024-09-17T11:59:40.460Z	INFO	ensuring stored resources are up to date
2024-09-17T11:59:40.460Z	INFO	migrating group version	{"crdGroup": "flowcollectors.flows.netobserv.io"}
W0917 11:59:40.535415       1 reflector.go:561] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1beta2.FlowCollector: conversion webhook for flows.netobserv.io/v1beta1, Kind=FlowCollector failed: Post "https://netobserv-webhook-service.netobserv.svc:443/convert?timeout=30s": no endpoints available for service "netobserv-webhook-service"
...
2024-09-17T11:59:51.723Z	INFO	migration completed	{"crdGroup": "flowcollectors.flows.netobserv.io"}
2024-09-17T11:59:51.723Z	INFO	migrating group version	{"crdGroup": "flowmetrics.flows.netobserv.io"}
2024-09-17T11:59:51.807Z	INFO	migration completed	{"crdGroup": "flowmetrics.flows.netobserv.io"}

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • [x] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • [ ] Does this PR require product documentation?
  • [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • [x] Does this PR require a product release notes entry?
  • [ ] If so, fill in "Release Note Text" in the JIRA.
  • [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • [ ] If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • [x] Standard QE validation, with pre-merge tests unless stated otherwise.
  • [ ] Regression tests only (e.g. refactoring with no user-facing change).
  • [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Nov 13 '24 15:11 openshift-ci-robot