crossplane-provider-grafana icon indicating copy to clipboard operation
crossplane-provider-grafana copied to clipboard

Resources with server-side apply error because of`spec.initProvider` being null

Open patst opened this issue 6 months ago • 5 comments

Crossplane Version

v1.19.1

Crossplane Grafana Provider Version

0.27.0 + 0.27.1

Affected Resource(s)

  • Folder
  • RuleGroup
  • Dashboard
  • probably more

YAML resources

# Rulegroup yaml
apiVersion: alerting.grafana.crossplane.io/v1alpha1
kind: RuleGroup
metadata:
  annotations:
    crossplane.io/composition-resource-name: alert
    crossplane.io/external-create-pending: "2025-05-08T06:51:58Z"
    crossplane.io/external-create-succeeded: "2025-05-08T06:51:59Z"
    crossplane.io/external-name: 1:asdf:asdf
  creationTimestamp: "2025-05-08T06:51:58Z"
  finalizers:
  - finalizer.managedresource.crossplane.io
  generateName: cs-gc-alert-wqpbp-nvmmq-
  generation: 11
  name: cs-gc-alert-wqpbp-nvmmq-54wvd
  resourceVersion: "759971"
  uid: 37ed9f54-c4f6-40ed-b902-78f3dcf09051
spec:
  deletionPolicy: Delete
  forProvider:
    folderUid: asdf
    intervalSeconds: 3600
    name: asdf
    orgId: "1"
    rule:
    - annotations:
        description: Describe me
        free: form
        runbook_url: https://foo.bar.com/
        stage: uat
        summary: Severity {{ $labels.severity }} - some summary
      condition: A
      data:
      - datasourceUid: loki
        model: |2-
          {"expr":"count_over_time({app=\"example\"} |= \"level=error\" [$__interval])","hide":false,"intervalMs":1000,"maxDataPoints":43200,"queryType":"range","refId":"A"}
        refId: A
        relativeTimeRange:
        - from: 3600
          to: 0
      execErrState: Error
      for: 60m
      isPaused: true
      name: asdf
      noDataState: Alerting
      notificationSettings:
      - contactPointSelector:
          matchLabels:
            grafana.gap.hdi.global/contact-point-claim-name: asdf
            grafana.gap.hdi.global/contact-point-claim-namespace: asdf
        groupBy:
        - '...'
        groupInterval: 5m
        groupWait: 30s
        repeatInterval: 4h
    - annotations:
        description: Describe me
        free: form
        runbook_url: https://foo.bar.com/
        stage: uat
        summary: Severity {{ $labels.severity }} - some summary
      condition: A
      data:
      - datasourceUid: loki
        model: |2-

          {"expr":"count_over_time({app=\"example\"} |= \"level=error\" [$__interval])","hide":false,"intervalMs":1000,"maxDataPoints":43200,"queryType":"range","refId":"A"}
        refId: A
        relativeTimeRange:
        - from: 3600
          to: 0
      execErrState: Alerting
      for: 60m
      isPaused: false
      name: asdf
      noDataState: NoData
      notificationSettings:
      - contactPointSelector:
          matchLabels:
            grafana.gap.hdi.global/contact-point-claim-name: asdf
            grafana.gap.hdi.global/contact-point-claim-namespace: asdf
        groupBy:
        - '...'
        groupInterval: 5m
        groupWait: 30s
        repeatInterval: 4h
  initProvider:
    folderUid: ""
    orgId: ""
  managementPolicies:
  - '*'
  providerConfigRef:
    name: platform-grafana-cloud-provider
status:
  atProvider:
    disableProvenance: false
    folderUid: asdf
    id: 1:asdf
    intervalSeconds: 3600
    name: asdf
    orgId: "1"
    rule:
    - annotations:
        description: Describe me
        free: form
        runbook_url: https://foo.bar.com/
        stage: uat
        summary: Severity {{ $labels.severity }} - some summary
      condition: A
      data:
      - datasourceUid: loki
        model: '{"expr":"count_over_time({app=\"example\"} |= \"level=error\" [$__interval])","hide":false,"queryType":"range","refId":"A"}'
        queryType: range
        refId: A
        relativeTimeRange:
        - from: 3600
          to: 0
      execErrState: Error
      for: 1h0m0s
      isPaused: true
      name: asdf
      noDataState: Alerting
      notificationSettings:
      - contactPoint: asdf
        groupBy:
        - '...'
        groupInterval: 5m
        groupWait: 30s
        repeatInterval: 4h
      uid: fel8qqes9dgxsa
    - annotations:
        description: Describe me
        free: form
        runbook_url: https://foo.bar.com/
        stage: uat
        summary: Severity {{ $labels.severity }} - some summary
      condition: A
      data:
      - datasourceUid: loki
        model: '{"expr":"count_over_time({app=\"example\"} |= \"level=error\" [$__interval])","hide":false,"queryType":"range","refId":"A"}'
        queryType: range
        refId: A
        relativeTimeRange:
        - from: 3600
          to: 0
      execErrState: Alerting
      for: 1h0m0s
      isPaused: false
      name: asdf
      noDataState: NoData
      notificationSettings:
      - contactPoint: asdf
        groupBy:
        - '...'
        groupInterval: 5m
        groupWait: 30s
        repeatInterval: 4h
      uid: ael8qqesgv5z4c
  conditions:
  - lastTransitionTime: "2025-05-08T06:52:00Z"
    reason: Available
    status: "True"
    type: Ready
  - lastTransitionTime: "2025-05-08T07:00:15Z"
    message: 'cannot patch the managed resource via server-side apply: RuleGroup.alerting.grafana.crossplane.io
      "cs-gc-alert-wqpbp-nvmmq-54wvd" is invalid: [spec.initProvider: Invalid value:
      "null": spec.initProvider in body must be of type object: "null", <nil>: Invalid
      value: "null": some validation rules were not checked because the object was
      invalid; correct the existing errors to complete validation]'
    reason: ReconcileError
    status: "False"
    type: Synced

Expected Behavior

Resource is provisioned as in Provider version 0.26.0

Actual Behavior

An server-side apply errors occurs:

    message: 'cannot patch the managed resource via server-side apply: RuleGroup.alerting.grafana.crossplane.io
      "cs-gc-alert-wqpbp-nvmmq-54wvd" is invalid: [spec.initProvider: Invalid value:
      "null": spec.initProvider in body must be of type object: "null", <nil>: Invalid
      value: "null": some validation rules were not checked because the object was
      invalid; correct the existing errors to complete validation]'

We do not set orgIds explicitly and do not have an spec.initProvider block set at all.

In spec.initProvider these values are being added:

spec:
  initProvider:
    folderUid: ""
    orgId: ""

I don't know it there is some omitempty set for empty strings which makes the initProvider block being null in the end and then causing the server-side apply error.

In provider version 0.26.0 the behaviour was different in the block looked like this:

spec:
    initProvider: {}

Steps to Reproduce

It does not occur for all of our resources.

It seems to be dependent if the managed resources is getting modified (not sure about that)

Important Factoids

It worked with provider version 0.26.0.

References

No response

patst avatar May 08 '25 07:05 patst

I think it is related to this change:

https://github.com/grafana/crossplane-provider-grafana/blob/fa1f8833e2fd07bad16c09c7b47e84745b5de42c/apis/alerting/v1alpha1/zz_generated.resolvers.go#L581

introduced with https://github.com/grafana/crossplane-provider-grafana/commit/4807919200b3097df426cf2b20ec427099b81d77

patst avatar May 08 '25 07:05 patst

I think this is related to https://github.com/crossplane/crossplane-tools/pull/106

Duologic avatar May 08 '25 13:05 Duologic

I think this is related to crossplane/crossplane-tools#106

I agree! I am wondering of the current version of the provider is usable at all with that behaviour or if the change should be rolled back.

We rolled back the a the 0.26.x version of the provider and had then manually clean up all resources to remove the empty orgIds, folderUids, etc.

patst avatar May 08 '25 13:05 patst

Also affects us with v0.28.0

holgerjh avatar May 23 '25 07:05 holgerjh

We have observed this behavior again in later version, specifically when upgrading from v0.28.0 to v0.30.0 on Dashboards.

Some of these issues seem to be solved by upgrading crossplane-tools in #306 but not all.

My theory is that this is not tied to a specific provider version, I think this is related to how upjet resolve references as defined here and what happens on provider upgrades.

I'm currently coming up with a script as workaround to simply drop those offending keys (for ex. spec.initProvider) from the managed manifest. It seems to bring the resources back into a successful state.

Duologic avatar Sep 01 '25 19:09 Duologic