structured-merge-diff icon indicating copy to clipboard operation
structured-merge-diff copied to clipboard

`dry-run` sometimes misses metadata and causes `failed to prune fields` error during CRD conversion

Open LittleWat opened this issue 3 years ago • 2 comments

Hello! We are developing a custom operator and utilizing fluxCD.

We have the v1alpha1 custom resource which is deployed by fluxCD. When we upgraded the custom resource operator from v1alpha1 to v1alpha2, flux notified us that dryrun failed with the following error message.

dry-run failed, error: failed to prune fields: failed add back owned items: failed to convert pruned object at version <foo.com>/v1alpha1: conversion webhook for <foo.com>/v1alpha2, Kind=<resource>  returned invalid metadata: invalid metadata of type <nil> in input object

Running the following dry-run command actually sometimes(about once in 5 times...?) fails with the same error message.

$ kubectl apply --server-side --dry-run=server  -f <v1alpha1-resource.yaml> --field-manager kustomize-controller

Error from server: failed to prune fields: failed add back owned items: failed to convert pruned object at version <foo.com>/v1alpha1: conversion webhook for <foo.com>/v1alpha2, Kind=<resource>  returned invalid metadata: invalid metadata of type <nil> in input object

~~But performing the actual conversion (the following command) never fails~~

performing the actual conversion also sometimes fails.

$ kubectl apply --server-side -f  <v1alpha1-resource.yaml> --field-manager kustomize-controller

<foo.com>/<resource> serverside-applied

The flakiness might be a key to solving this.

Our conversion code is similar to https://github.com/IBM/operator-sample-go/blob/b79e66026a5cc5b4994222f2ef7aa962de9f7766/operator-application/api/v1alpha1/application_conversion.go#L37

We checked the conversion log. Just one dryrun command called ConvertTo function for 3 times and ConvertFrom function for 3 times. For the last one time for each ConvertTo and ConvertFrom, we noticed that the request has lacking the information of metadata and spec when it fails. The error log is like "metadata":{"creationTimestamp":null},"spec":{} (The normal log is like "metadata":{"name":"<foo>","namespace":"<foo>","uid":"09b69792-56d5-4217-b23c-4d418d3f904b","resourceVersion":"1707796","generation":3,"creationTimestamp":"2022-09-16T07:28:54Z","labels":{"kustomize.toolkit.fluxcd.io/name":"<foo>","kustomize.toolkit.fluxcd.io/namespace":"flux-system"}},"spec":{"attribute1":[{...)

We could confirm that this weird thing happens when the managedField has two components(kustomization-controller and our-operator) as follows:

apiVersion: <foo.com>/v1alpha2
kind: <MyResource>
metadata:
  creationTimestamp: "2022-09-15T04:52:03Z"
  generation: 1
  labels:
    kustomize.toolkit.fluxcd.io/name: operator-sample
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  managedFields:
  - apiVersion: <foo.com>/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:kustomize.toolkit.fluxcd.io/name: {}
          f:kustomize.toolkit.fluxcd.io/namespace: {}
      f:spec:
        f:attribute1: {}
        f:attribute2: {}
    manager: kustomize-controller
    operation: Apply
    time: "2022-09-15T04:52:03Z"
  - apiVersion: <foo.com>/v1alpha2
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:attribute1: {}
        f:attribute2: {}
    manager: <our-operator>
    operation: Update
    time: "2022-09-15T04:52:04Z"
  name: v1alpha1-flux
  namespace: flux
  resourceVersion: "483157"
  uid: 696bed77-a12b-45d0-b240-8d685cf790e0

spec:
  ...
status:
  ...

I asked this question in the flux repo but I could not get the reason why. https://github.com/fluxcd/flux2/discussions/3105

I got stuck in this for more than one week and any ideas are really appreciated. Thanks!

LittleWat avatar Sep 21 '22 13:09 LittleWat

@kwiesmueller Sorry to mention. 🙇 I saw your TODO comment commit. https://github.com/kubernetes-sigs/structured-merge-diff/blob/26781d0c10bfdbd7d66b18d8be83985f623df9f8/merge/update.go#L193

Is this related to this issue...?

LittleWat avatar Sep 26 '22 01:09 LittleWat

I create a sample repo to reproduce this error. https://github.com/LittleWat/conversion-webhook-test-with-flux

I am glad if this repo is useful for debugging. Thank you!

LittleWat avatar Oct 07 '22 06:10 LittleWat

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 05 '23 07:01 k8s-triage-robot

/remove-lifecycle stale

LittleWat avatar Jan 10 '23 07:01 LittleWat

Also experiencing this exact issue.

pnorth1 avatar Mar 24 '23 20:03 pnorth1

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 22 '23 20:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 19 '24 00:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Feb 18 '24 00:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 18 '24 00:02 k8s-ci-robot