fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Ignore missed resources

Open SnelsSM opened this issue 1 year ago • 7 comments

Is your feature request related to a problem?

Some charts (consul for example) removes temporary resources (such as jobs) after deployment. In this situation, the state of bundle resource is "Modified ... %some resource% missing" and there are no options to ignore this missing resource.

Solution you'd like

Maybe need some options in comparePatches. Something like:

spec:
  diff:
    comparePatches:
    - apiVersion: batch/v1
      kind: Job
      name: consul-consul-server-acl-init
      namespace: consul
      operations:
      - op: ignore

Perhaps there is a ready-made solution? I tried to find it, but I didn't find it.

Alternatives you've considered

No response

Anything else?

No response

SnelsSM avatar Jan 09 '24 07:01 SnelsSM

I've run into this problem as well for jobs with a ttl set. I tried ignoring all paths but it doesn't work. E.g.

    - apiVersion: batch/v1
      kind: Job
      namespace: rook-ceph
      jsonPointers:
        - /

jhoblitt avatar Feb 16 '24 18:02 jhoblitt

I have tried to reproduce this with Fleet 0.8 and 0.9 installing a Consul (1.3.3) chart, without success. In both cases, the bundle was ready even after jobs were deleted by the chart.

Here is the fleet.yaml used for testing (no bundle diffs involved):

defaultNamespace: consul
helm:
  releaseName: test-consul
  chart: "consul"
  repo: "https://helm.releases.hashicorp.com"

  version: "1.3.3"

  values:
    global:
      name: consul

Do you have more details about the Fleet version and config used here?

weyfonk avatar Feb 20 '24 17:02 weyfonk

@weyfonk I've worked on this a bit more and it seems that if the chart uses the hook annotations. E.g.:

  annotations:
    helm.sh/hook: post-install, post-upgrade
    helm.sh/hook-delete-policy: hook-succeeded, before-hook-creation

Then fleet does not warn about missing resources. This is the workaround that I've been using instead of setting a ttl. It also seems that setting the hook annotations works when using a "plain" yaml bundle.

However, I still think there's a good use case for being able to configure the drift detection to ignore a missing resource.

It should be possible to reproduce this with a simple yaml bundle of:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  ttlSecondsAfterFinished: 100
  template:
    spec:
      containers:
      - name: pi
        image: perl:5.34.0
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

jhoblitt avatar Feb 20 '24 17:02 jhoblitt

Thanks @jhoblitt, that job spec does reproduce the issue. We will need to have a closer look at our bundle diffs feature to check its support for whole resources. I've tried using a jsonPointers field with an empty string, as explained here, to point to the root of the job, but to no avail.

weyfonk avatar Feb 21 '24 10:02 weyfonk

Here is the fleet.yaml used for testing (no bundle diffs involved):

The problem happens when global.acls.manageSystemACLs = true. Helm creates 2 jobs: %release name%-server-acl-init and %release name%-server-acl-init-cleanup %release name%-server-acl-init-cleanup then removes the %release name%-server-acl-init. The result: Modified(1) [Bundle rke2-ops-local-consul-charts-consul]; job.batch consul/consul-consul-server-acl-init missing

SnelsSM avatar Apr 02 '24 07:04 SnelsSM

It should be possible to ignore resources completely by omitting them from the plan: https://github.com/rancher/fleet/blob/29471373d7ad6cd8b2f36e70cc7e25d8e7ebb8b5/internal/cmd/agent/deployer/desiredset/plan.go#L31-L46

manno avatar Sep 30 '24 08:09 manno

Additional QA

Problem

Installing a workload which would delete resources would result in the corresponding GitRepo's status being stuck as Modified, reporting deleted resources as Missing

Solution

Added support to a new ignore operation in bundle diffs (see docs)

Testing

Engineering Testing

Manual Testing

Installed a Consul chart, as explained in the docs PR linked above, with a fleet.yaml containing a patch with the ignore operation for the chart's init job. Checked that, once the job had been deleted, the bundle, bundle deployment and GitRepo were reported as ready.

Automated Testing

This is covered by:

  • unit tests, validating that resources having a bundle diff patch (comparePatch) with operation ignore are removed from the set of expected resources in the workload
  • integration tests, validating that the agent keeps reporting a bundle deployment as Ready when deleting a resource which has a bundle diff patch with operation ignore

QA Testing Considerations

This could also work to ignore drift, e.g. when manually deleting a resource installed from a GitRepo, if drift correction is disabled.

Regressions Considerations

Other operations, namely removeing a path on a given resource, should still work as expected.

weyfonk avatar Mar 10 '25 09:03 weyfonk

System Information

Rancher Version Fleet Version
v2.12-71a527a9c07db8bcc2ddfe95ae3ed5bd012f4589-head 107.0.0+up0.13.0-alpha.3

Test Scenario's

Sr No. Test Scenario Test Result
1 Test GitRepo shows Modified status when some resources are missing
2 Test GitRepo Shows Active status after using diff section in fleet.yaml which ignores missing resources

Scenario 1

Steps followed

  • Created GitRepo from this fleet.yaml
  • Wait for the configMap to be created and Job will be missing.
  • GitRepo status will not be a Modified because of diff section is used.
Screenshot showing GitRepo in Modified state

Image


Scenario 2

Steps followed

  • Created GitRepo from this fleet.yaml
  • Wait for the configMap to be created and Job will be missing.
  • GitRepo status will not be a Modified because of diff section is used.
Screenshot showing GitRepo in Ready state after using diff in fleet.yaml

Image


Note: Thanks @weyfonk for creating repro repo for testing. I used @weyfonk and my repo in first and second scenario with only diff change in fleet.yaml.

sbulage avatar May 28 '25 15:05 sbulage