crossplane icon indicating copy to clipboard operation
crossplane copied to clipboard

Crossplane fails to synchronize claims with XRs

Open fernandezcuesta opened this issue 1 year ago • 15 comments

What happened?

From time to time I see that claims and XRs loose sync such as here (see second resource):

❯ kubectl get xirsas
NAME                                           API                           SYNCED   READY   COMPOSITION                                         AGE
acm-pca-issuer-nw-eu-west-3-main               infra.nexthink.com/v1alpha3   True     True    custom.policy.xirsas.infra.nexthink.com             9m30s
acm-pca-issuer-nw-us-east-2-main               infra.nexthink.com/v1alpha3                    custom.policy.xirsas.infra.nexthink.com             9m31s
collector-traffic-nw-eu-west-3-main            infra.nexthink.com/v1alpha3   True     True    custom.auth.xirsas.infra.nexthink.com               22h
[...]

And this does not change until I do a rollout restart of crossplane deployment. Logs in crossplane deployment look to be on a loop continuously prompting logs such these:

crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}
crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}
crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}
crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}

How can we reproduce it?

What environment did it happen in?

Crossplane version: universal-crossplane-1.14.5-up.1

EKS v1.27.9-eks-5e0fdde

Relevant PRs

  • https://github.com/crossplane/crossplane/pull/5422
  • https://github.com/crossplane/crossplane/pull/5437
  • https://github.com/crossplane/crossplane/pull/5468

fernandezcuesta avatar Feb 16 '24 10:02 fernandezcuesta

As suggested by @haarchri I set --enable-composition-webhook-schema-validation=false but unfortunately didn't help.

fernandezcuesta avatar Feb 16 '24 10:02 fernandezcuesta

do you using realtime compositions ? for reference we disabled the tests for realtime compositions with https://github.com/crossplane/crossplane/pull/5296

haarchri avatar Feb 16 '24 10:02 haarchri

looks like:

      containers:
      - args:
        - core
        - start
        - --enable-composition-functions
        - --enable-environment-configs
        - --enable-realtime-compositions
        - --enable-usages
        - --enable-composition-webhook-schema-validation=false

fernandezcuesta avatar Feb 16 '24 10:02 fernandezcuesta

think its related to: https://github.com/crossplane/crossplane/issues/5151

haarchri avatar Feb 16 '24 10:02 haarchri

@haarchri were you able to confirm positively that this behavior is related to realtime compositions? i.e. it only manifests when --enable-realtime-compositions is set? 🤔

jbw976 avatar Feb 20 '24 12:02 jbw976

Yes and i can reproduce this issue with enabled Realtime compositions - currently debugging it

haarchri avatar Feb 20 '24 12:02 haarchri

Awesome dude! thanks for confirming - tracking this as part of the maturing realtime compositions epic:

  • https://github.com/crossplane/crossplane/issues/4828

jbw976 avatar Feb 20 '24 12:02 jbw976

NAME                          SYNCED   READY   COMPOSITION                                             AGE
realtime-composition1-dshdn   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   45s
realtime-composition2-4pbd5   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   45s
realtime-composition3-r8zbk   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   44s
realtime-composition4-58f9w                                                                            14s
realtime-composition5-9p5cn                                                                            14s
realtime-composition6-vnsgs                                                                            14s

after new claims created all new XRs have no SYNCED, READY or status - during crossplane startup you can see the following log line: cannot list in CompositionRevision handler :

kubectl get xnopresources
NAME                          SYNCED   READY   COMPOSITION                                             AGE
realtime-composition1-dshdn   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   17m
realtime-composition2-4pbd5   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   17m
realtime-composition3-r8zbk   True     True    xnopresources.realtime-compositions.e2e.crossplane.io   17m
realtime-composition4-58f9w                                                                            16m
realtime-composition5-9p5cn                                                                            16m
realtime-composition6-vnsgs                                                                            16m

problem starts around here - think we can hit this issue also without real-time compositions - here is no feature flag block around : https://github.com/crossplane/crossplane/blob/v1.15.0/internal/controller/apiextensions/definition/reconciler.go#L475-L481

and then we hit the following: https://github.com/crossplane/crossplane/blob/v1.15.0/internal/controller/apiextensions/composite/reconciler.go#L712

if i add a long sleep here - its working - so i wonder if the setup is to fast an we need to find a way to wait https://github.com/crossplane/crossplane/blob/v1.15.0/internal/controller/apiextensions/definition/reconciler.go#L474

haarchri avatar Feb 20 '24 16:02 haarchri

think we can hit this issue also without real-time compositions

So we do think this is something that is hitting mainstream scenarios in v1.15? Is there reason to believe that we should backport any of these PRs for a v1.15 patch release?

  • https://github.com/crossplane/crossplane/pull/5437
  • https://github.com/crossplane/crossplane/pull/5422

/cc @haarchri @sttts @phisco

jbw976 avatar Mar 04 '24 14:03 jbw976

To replicate the issue, first install Crossplane version 1.15.0 and run with --enable-realtime-compositions Then, follow these steps:

kubectl apply -f test/e2e/manifests/apiextensions/composition/realtime-compositions/setup
for i in {1..3}; do
  kubectl apply -f - <<EOF
apiVersion: realtime-compositions.e2e.crossplane.io/v1alpha1
kind: NopResource
metadata:
  namespace: default
  name: realtime-composition$i
  labels:
    realtime-compositions: "true"
spec:
  coolField: "I'm cool!"
  compositeDeletePolicy: Foreground
EOF
done

Wait all Claim, XR and managed resources ready Stop crossplane

Start crossplane

for i in {1..6}; do
  kubectl apply -f - <<EOF
apiVersion: realtime-compositions.e2e.crossplane.io/v1alpha1
kind: NopResource
metadata:
  namespace: default
  name: realtime-composition$i
  labels:
    realtime-compositions: "true"
spec:
  coolField: "I'm cool!"
  compositeDeletePolicy: Foreground
EOF
done

If the issue doesn't occur, restart Crossplane and create more claims than you did previously. You will then observe the error "cannot list in CompositionRevision handler" in the logs.

haarchri avatar Mar 08 '24 21:03 haarchri

i did a long debug session with @sttts and i cannot reproduce the issue based on this PR https://github.com/crossplane/crossplane/pull/5422 - so i have a good feeling that we fixed the issue

haarchri avatar Mar 08 '24 21:03 haarchri

Whoops, #5651 shouldn't have closed this (yet). It might fix the issue, but we don't know for sure.

It would help a lot if someone could try reproduce this issue with #5651.

negz avatar May 21 '24 18:05 negz

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

github-actions[bot] avatar Aug 20 '24 01:08 github-actions[bot]

/fresh

haarchri avatar Aug 20 '24 05:08 haarchri