fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Updating Ports with correctDrift enabled using multiple-paths repo triggers an error

Open mmartin24 opened this issue 1 year ago • 6 comments

Issue

Error triggered after updating ports with correctDrift enabled while using multiple-paths repo

Reproduction steps

  • Install Rancher 2.8-head with 3 downstream clusters
  • Created a GitRepo by enabling correctDrift. (https://github.com/rancher/fleet-test-data, path: multiple-paths) and deployed to all downstream clusers
  • Navigated to DS cluster --> Services
  • Edited service by updating it's port.
  • Waited for 30 seconds.
  • Observations on Rancher 2.8.5:
  • GitRepo is in Modified state with Error(See screenshot.) Image
  • Navigated to Continuous Delivery --> Clusters.
  • DS cluster on which service was deployed/created is in Modified state.
  • Above observation is seen in Rancher 2.8.5 + ~~2.8-head but not in 2.9.0-alpha7~~. Edit: Observed in 2.9 as well IF Force update is applied

This check has been currently added to ui/e2e ci here. A video can be downloaded from artifact and can by playing the part p1.specs.ts on minute 8:02

mmartin24 avatar Jul 05 '24 15:07 mmartin24

Not sure if the Modified state is somehow related to https://github.com/rancher/dashboard/issues/11404

mmartin24 avatar Jul 05 '24 15:07 mmartin24

FTR: After adding latest checks for this issue on our ci, 2.8-head did not show this error today, but 2.7-head did.

mmartin24 avatar Jul 08 '24 07:07 mmartin24

I just performed a manual check and it seems there was a caveat in our automation. We removed the click on force update from our tests. For some reason in 2.7 this was still ok and spotted the issue, but in 2.9 it would remain unnoticed until clicked. After doing this the issue appeared:

Image

I will keep investigating tomorrow morning

mmartin24 avatar Jul 09 '24 16:07 mmartin24

Seems related to how helm upgrades services for us. See https://github.com/kubernetes/kubernetes/issues/105610

fleet-agent should store the error in the bundledeployments status

manno avatar Jul 10 '24 09:07 manno

Adding error logs

{
  "level": "error",
  "ts": "2024-07-10T09:02:36Z",
  "logger": "bundledeployment",
  "msg": "Failed to deploy bundle",
  "controller": "bundledeployment",
  "controllerGroup": "fleet.cattle.io",
  "controllerKind": "BundleDeployment",
  "BundleDeployment": {
    "name": "test-drift-multiple-paths-service",
    "namespace": "cluster-fleet-default-imported-2-945038eba7ea"
  },
  "namespace": "cluster-fleet-default-imported-2-945038eba7ea",
  "name": "test-drift-multiple-paths-service",
  "reconcileID": "308f9b4f-fb0f-48ca-a273-2e475b183966",
  "status": {
    "conditions": [
      {
        "type": "Installed",
        "status": "True",
        "lastUpdateTime": "2024-07-10T09:01:20Z"
      },
      {
        "type": "Deployed",
        "status": "False",
        "lastUpdateTime": "2024-07-10T09:02:36Z",
        "reason": "Error",
        "message": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\""
      },
      {
        "type": "Ready",
        "status": "True",
        "lastUpdateTime": "2024-07-10T09:02:36Z"
      },
      {
        "type": "Monitored",
        "status": "True",
        "lastUpdateTime": "2024-07-10T09:01:20Z"
      }
    ],
    "appliedDeploymentID": "s-e900fb60b86d8593e95a733a0c0d1794f2d71a00910f794d19bcd4d57deca:aa73273923fd2b194b95dc51be330a7b1be92dafa689e0afb400abda8b37d8c0",
    "release": "test-fleet-mp-service/test-drift-multiple-paths-service:1",
    "ready": true,
    "nonModified": true,
    "display": {
      "deployed": "Error: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"",
      "monitored": "True",
      "state": "Ready"
    },
    "syncGeneration": 0
  },
  "error": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"",
  "errorVerbose": "cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"\nhelm.sh/helm/v3/pkg/kube.(*Client).Update\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/[email protected]/pkg/kube/client.go:438\nhelm.sh/helm/v3/pkg/action.(*Install).performInstall\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/[email protected]/pkg/action/install.go:456\nhelm.sh/helm/v3/pkg/action.(*Install).performInstallCtx.func1\n\t/home/runner/go/pkg/mod/github.com/rancher/helm/[email protected]/pkg/action/install.go:421\nruntime.goexit\n\t/home/runner/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695",
  "stacktrace": "github.com/rancher/fleet/internal/cmd/agent/controller.(*BundleDeploymentReconciler).Reconcile\n\t/home/runner/work/fleet/fleet/internal/cmd/agent/controller/bundledeployment_controller.go:129\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"
}
{
  "level": "error",
  "ts": "2024-07-10T09:02:36Z",
  "msg": "Reconciler error",
  "controller": "bundledeployment",
  "controllerGroup": "fleet.cattle.io",
  "controllerKind": "BundleDeployment",
  "BundleDeployment": {
    "name": "test-drift-multiple-paths-service",
    "namespace": "cluster-fleet-default-imported-2-945038eba7ea"
  },
  "namespace": "cluster-fleet-default-imported-2-945038eba7ea",
  "name": "test-drift-multiple-paths-service",
  "reconcileID": "308f9b4f-fb0f-48ca-a273-2e475b183966",
  "error": "failed deploying bundle: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\"",
  "errorCauses": [
    {
      "error": "failed deploying bundle: cannot patch \"mp-app-service\" with kind Service: Service \"mp-app-service\" is invalid: spec.ports[1].name: Duplicate value: \"required-name2\""
    }
  ],
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222"
}

mmartin24 avatar Jul 10 '24 09:07 mmartin24

Additional QA

Problem

When failing to correct drift on a resource (eg. modified ports array on a service), Fleet would leave a GitRepo in Modified state, with no error on the corresponding bundle deployment status.

Solution

  • The drift correction error is now reflected in the bundle deployment status, which should in turn propagate it to the GitRepo status
  • Setting force: true on the GitRepo resolves the error by deleting and recreating the Helm release for the bundle deployment, hence recreating the service in this case.

Testing

See reproduction steps above, in the issue description.

Engineering Testing

Manual Testing

  1. Created a GitRepo with drift correction enabled (but not set to force mode) pointing to rancher/fleet-test-data's multiple-paths
  2. Edited the created service ports
  3. Checked status of the GitRepo and bundle deployment
  4. Updated the GitRepo drift correction mode to true
  5. Saw the GitRepo and bundle deployment status error disappear, once the service had been recreated.

Automated Testing

  • Integration tests cover propagation of the drift correction error to the bundle deployment status
  • End-to-end tests verify that when a bundle is marked as modified, patching a GitRepo to set its correctDrift.force option to true eventually updates the bundle status, in that the bundle will no longer appear as modified.

QA Testing Considerations

  • Test how the GitRepo status is reflected in the Rancher UI

Regressions Considerations

N/A

weyfonk avatar Sep 11 '24 15:09 weyfonk

Rechecked in v2.10-212d8b6e92992235d791d8f2aaea8436ab4f6b77-head with fleet:105.0.0+up0.11.0-rc.2 and problem persist.

Tried exact reproduction steps and error persisted:

Created a GitRepo with drift correction enabled (but not set to force mode) pointing to rancher/fleet-test-data's multiple-paths Edited the created service ports Checked status of the GitRepo and bundle deployment Updated the GitRepo drift correction mode to true

As relevant note, when updating the ports the following log appears:

"Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes reqeueuing with exponential backoff. 

Extra notes: UI issues updating ports via UI and editing yamls. Setting it back to backlog

mmartin24 avatar Nov 04 '24 10:11 mmartin24

The above warning has been fixed through #3045. Tried to reproduce this issue against Rancher v2.10.0-rc.3 with Fleet v0.11.0, without success: when updating the GitRepo to set its correctDrift force mode to true, drift is corrected and the GitRepo is set back from Modified to Active after a few seconds.

weyfonk avatar Nov 14 '24 15:11 weyfonk

The above warning has been fixed through #3045. Tried to reproduce this issue against Rancher v2.10.0-rc.3 with Fleet v0.11.0, without success: when updating the GitRepo to set its correctDrift force mode to true, drift is corrected and the GitRepo is set back from Modified to Active after a few seconds.

I re-checked it in Rancher 2.10 with fleet fleet:105.0.1+up0.11.1 and still can reproduce it after pressing Force Update the Gitrepo once changed the port and still occurs:

image

Nevertheless, as discussed offline, agreed to close it and not to spend more time on this issue as it it as cornercase

mmartin24 avatar Nov 27 '24 14:11 mmartin24