Crossplane fails to synchronize claims with XRs
What happened?
From time to time I see that claims and XRs loose sync such as here (see second resource):
❯ kubectl get xirsas
NAME API SYNCED READY COMPOSITION AGE
acm-pca-issuer-nw-eu-west-3-main infra.nexthink.com/v1alpha3 True True custom.policy.xirsas.infra.nexthink.com 9m30s
acm-pca-issuer-nw-us-east-2-main infra.nexthink.com/v1alpha3 custom.policy.xirsas.infra.nexthink.com 9m31s
collector-traffic-nw-eu-west-3-main infra.nexthink.com/v1alpha3 True True custom.auth.xirsas.infra.nexthink.com 22h
[...]
And this does not change until I do a rollout restart of crossplane deployment. Logs in crossplane deployment look to be on a loop continuously prompting logs such these:
crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}
crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}
crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}
crossplane-7c898b5fdf-rf7mg universal-crossplane {"level":"info","ts":"2024-02-16T10:15:46Z","logger":"crossplane","msg":"Enqueueing composite resource because managed resource changed","controller":"defined/compositeresourcedefinition.apiextensions.crossplane.io","request":{"name":"xcertmanagers.infra.nexthink.com"},"uid":"c3eb8136-bcc4-4c98-8832-d782e62cb2b2","version":"168408426","name":"xcertmanagers.infra.nexthink.com","name":"cert-manager-nw-eu-west-3-main","mrGVK":"infra.nexthink.com/v1alpha1, Kind=XAcmPcaIssuer","mrName":"pca-plugin-nw-eu-west-3-main"}
How can we reproduce it?
What environment did it happen in?
Crossplane version: universal-crossplane-1.14.5-up.1
EKS v1.27.9-eks-5e0fdde
Relevant PRs
- https://github.com/crossplane/crossplane/pull/5422
- https://github.com/crossplane/crossplane/pull/5437
- https://github.com/crossplane/crossplane/pull/5468
As suggested by @haarchri I set --enable-composition-webhook-schema-validation=false but unfortunately didn't help.
do you using realtime compositions ? for reference we disabled the tests for realtime compositions with https://github.com/crossplane/crossplane/pull/5296
looks like:
containers:
- args:
- core
- start
- --enable-composition-functions
- --enable-environment-configs
- --enable-realtime-compositions
- --enable-usages
- --enable-composition-webhook-schema-validation=false
think its related to: https://github.com/crossplane/crossplane/issues/5151
@haarchri were you able to confirm positively that this behavior is related to realtime compositions? i.e. it only manifests when --enable-realtime-compositions is set? 🤔
Yes and i can reproduce this issue with enabled Realtime compositions - currently debugging it
Awesome dude! thanks for confirming - tracking this as part of the maturing realtime compositions epic:
- https://github.com/crossplane/crossplane/issues/4828
NAME SYNCED READY COMPOSITION AGE
realtime-composition1-dshdn True True xnopresources.realtime-compositions.e2e.crossplane.io 45s
realtime-composition2-4pbd5 True True xnopresources.realtime-compositions.e2e.crossplane.io 45s
realtime-composition3-r8zbk True True xnopresources.realtime-compositions.e2e.crossplane.io 44s
realtime-composition4-58f9w 14s
realtime-composition5-9p5cn 14s
realtime-composition6-vnsgs 14s
after new claims created all new XRs have no SYNCED, READY or status - during crossplane startup you can see the following log line: cannot list in CompositionRevision handler :
kubectl get xnopresources
NAME SYNCED READY COMPOSITION AGE
realtime-composition1-dshdn True True xnopresources.realtime-compositions.e2e.crossplane.io 17m
realtime-composition2-4pbd5 True True xnopresources.realtime-compositions.e2e.crossplane.io 17m
realtime-composition3-r8zbk True True xnopresources.realtime-compositions.e2e.crossplane.io 17m
realtime-composition4-58f9w 16m
realtime-composition5-9p5cn 16m
realtime-composition6-vnsgs 16m
problem starts around here - think we can hit this issue also without real-time compositions - here is no feature flag block around : https://github.com/crossplane/crossplane/blob/v1.15.0/internal/controller/apiextensions/definition/reconciler.go#L475-L481
and then we hit the following: https://github.com/crossplane/crossplane/blob/v1.15.0/internal/controller/apiextensions/composite/reconciler.go#L712
if i add a long sleep here - its working - so i wonder if the setup is to fast an we need to find a way to wait https://github.com/crossplane/crossplane/blob/v1.15.0/internal/controller/apiextensions/definition/reconciler.go#L474
think we can hit this issue also without real-time compositions
So we do think this is something that is hitting mainstream scenarios in v1.15? Is there reason to believe that we should backport any of these PRs for a v1.15 patch release?
- https://github.com/crossplane/crossplane/pull/5437
- https://github.com/crossplane/crossplane/pull/5422
/cc @haarchri @sttts @phisco
To replicate the issue, first install Crossplane version 1.15.0 and run with --enable-realtime-compositions
Then, follow these steps:
kubectl apply -f test/e2e/manifests/apiextensions/composition/realtime-compositions/setup
for i in {1..3}; do
kubectl apply -f - <<EOF
apiVersion: realtime-compositions.e2e.crossplane.io/v1alpha1
kind: NopResource
metadata:
namespace: default
name: realtime-composition$i
labels:
realtime-compositions: "true"
spec:
coolField: "I'm cool!"
compositeDeletePolicy: Foreground
EOF
done
Wait all Claim, XR and managed resources ready Stop crossplane
Start crossplane
for i in {1..6}; do
kubectl apply -f - <<EOF
apiVersion: realtime-compositions.e2e.crossplane.io/v1alpha1
kind: NopResource
metadata:
namespace: default
name: realtime-composition$i
labels:
realtime-compositions: "true"
spec:
coolField: "I'm cool!"
compositeDeletePolicy: Foreground
EOF
done
If the issue doesn't occur, restart Crossplane and create more claims than you did previously. You will then observe the error "cannot list in CompositionRevision handler" in the logs.
i did a long debug session with @sttts and i cannot reproduce the issue based on this PR https://github.com/crossplane/crossplane/pull/5422 - so i have a good feeling that we fixed the issue
Whoops, #5651 shouldn't have closed this (yet). It might fix the issue, but we don't know for sure.
It would help a lot if someone could try reproduce this issue with #5651.
Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.
/fresh