controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

Reconcile not triggered on updates to container status

Open cclarkedt opened this issue 6 months ago • 14 comments

My controller owns and is responsible for making a deployment. The one container in this deployment has a liveness and startup probe. When the probes reach their failure threshold kubelet restarts the container. On this event when kubelet restarts the container I'd like to receive a reconcile to retrieve the latest container status (which I am currently not).

My understanding was that because my controller Owns(&corev1.Pod{}) (and the pod has the correct owner refs), that the update event on the pod triggered by kubelet of Warning Unhealthy 4m17s (x45 over 39m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 503 would cause a reconcile.

Apologies if I am missing something.

Manager setup:

func ignoreStatusUpdates() predicate.Predicate {
	return predicate.Funcs{
		UpdateFunc: func(e event.UpdateEvent) bool {
			// Ignore updates to CR status in which case metadata.Generation does not change
			return e.ObjectOld.GetGeneration() != e.ObjectNew.GetGeneration()
		},
	}
}

func (r *MyController) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&appv1alpha1.MyCR{}, builder.WithPredicates(ignoreStatusUpdates())).
		Owns(&appsv1.Deployment{}).
		Owns(&corev1.Pod{}).
		Owns(&corev1.Secret{}).
		Owns(&corev1.ConfigMap{}).
		Owns(&corev1.Service{}).
		Complete(r)
}

cclarkedt avatar Jun 27 '25 13:06 cclarkedt

Are you not getting reconciles for the MyCR parent object when the pod restarts?

Usually when you have an owned object, you would check those owned objects in your reconcile loop if they triggered the parent object to be reconciled. The owned object creates an event for the parent CR to be reconciled and in the Reconcile method, you would check to ensure that the owned object is what you suggest it to be to fulfill the MyCR state for the cluster.

troy0820 avatar Jun 27 '25 15:06 troy0820

I think this only works if the Pod has an ownerRef to MyCR. Is that the case?

For more details, see: https://github.com/kubernetes-sigs/controller-runtime/blob/aeac9c59d7047a66d1d4ef521f66042b024cbb3b/pkg/builder/controller.go#L344

sbueringer avatar Jun 28 '25 04:06 sbueringer

Apologies for the slow reply.

Are you not getting reconciles for the MyCR parent object when the pod restarts?

From my understanding kubelet will restart the container and not the pod on failure of liveness/readiness/start-up probe(s). The pod does get an event of the container restarts though my reconciliation still isn't trigered, e.g., Warning Unhealthy 3m52s (x5 over 4m4s) kubelet Startup probe failed: HTTP probe failed with statuscode: 503

In my case I have a deployment which has a pod template which defines the unhealthy containers (Deployment -> Pod -> Container). Deployments and Pods implement the Object interface so can therefore have an owner reference defined, which I have done - both being owned by MyCR.

I think this only works if the Pod has an ownerRef to MyCR. Is that the case?

@sbueringer Yes this is the case, am I missing something?

cclarkedt avatar Jul 10 '25 11:07 cclarkedt

Probably this one: https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/builder/controller.go#L118

// The default behavior reconciles only the first controller-type OwnerReference of the given type. // Use Owns(object, builder.MatchEveryOwner) to reconcile all owners.

I assume you are not setting controller: true on the ownerRef, so you'lll have to use builder.MatchEveryOwner

sbueringer avatar Jul 12 '25 04:07 sbueringer

@sbueringer I'm setting controller: true on the pod's ownerRef:

ownerReferences:
  - apiVersion: apps/v1
     blockOwnerDeletion: true
     controller: true
     kind: ReplicaSet
     name: my-cr-12345
     uid: xxx

Not too sure why the MyCR kind gets set as ReplicaSet here and not MyCR - this is handled using controllerutil.SetOwnerReference.

Because it is the controller, this should be ok without builder.MatchEveryOwner right?

cclarkedt avatar Jul 14 '25 10:07 cclarkedt

There can only be one ownerRef with controller: true per object, and in this case it should be ReplicaSet not your CRD

And no if your CRD is not set with controller: true it won't work without builder.MatchEveryOwner

sbueringer avatar Jul 14 '25 11:07 sbueringer

Thanks for the info, assuming I've set this in the correct place (see below) I am still not getting reconcile events as originally described

// SetupWithManager sets up the controller with the Manager.
func (r *MyController) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&appv1alpha1.MyCR{}, builder.WithPredicates(ignoreStatusUpdates())).
		Owns(&appsv1.Deployment{}).
		Owns(&corev1.Pod{}, builder.MatchEveryOwner).
		Owns(&corev1.Secret{}).
		Owns(&corev1.ConfigMap{}).
		Owns(&corev1.Service{}).
		Complete(r)


cclarkedt avatar Jul 14 '25 12:07 cclarkedt

Then no idea

sbueringer avatar Jul 14 '25 16:07 sbueringer

func ignoreStatusUpdates() predicate.Predicate {
	return predicate.Funcs{
		UpdateFunc: func(e event.UpdateEvent) bool {
			// Ignore updates to CR status in which case metadata.Generation does not change
			return e.ObjectOld.GetGeneration() != e.ObjectNew.GetGeneration()
		},
	}
}

Wouldn't this not give you the events from an owned object? Meaning that the generation of the parent object isn't changed so you may not get the event when the owned object triggers the reconciliation of the parent object.

https://github.com/kubernetes-sigs/controller-runtime/issues/2684#issuecomment-1942450084

troy0820 avatar Jul 14 '25 19:07 troy0820

Wouldn't this not give you the events from an owned object?

From doing some debugging this event filter isn't receiving the pod restart events either. Do you have any cases where this behaviour has worked for you?

cclarkedt avatar Jul 15 '25 07:07 cclarkedt

So I found that it is partly working as expected and the CR is receiving reconcile events on liveness probe failure. However, it is when the start-up probe fails in one of the containers provisioned by the pod that it blocks any further reconciles. I assume this is because there would be no difference in the pod's overall status (NotReady) so it doesn't send the event?

cclarkedt avatar Jul 15 '25 11:07 cclarkedt

p1

If u could share Reconcile function, it can help to find the problem.

I guess you might have missed setting the controller reference before creating the deployment. Consider adding SetControllerReference like this:

if err := ctrl.SetControllerReference(mycr, deployment, r.Scheme); err != nil {
    log.Error(err, "unable to set controller reference")
    return ctrl.Result{}, err
}
if err := r.Create(ctx, deployment); err != nil {
    log.Error(err, "unable to create deployment")
    return ctrl.Result{}, err
}

p2

thus. You don't need to include these resources in the Owns configuration unless u create them directly and they have had their controller reference set with SetControllerReference

Owns(&corev1.Pod{}).
Owns(&corev1.Secret{}).
Owns(&corev1.ConfigMap{}).
Owns(&corev1.Service{})

If you really need to watch these resources, you should use WatchesRawSource. However, make sure to implement proper filtering to handle them correctly.

p3

Since ownerReferences is a list, MatchEveryOwner is used to match the entire ownerReferences list of an object. Without MatchEveryOwner, the default behavior is to only match the primary ownerReference with controller=true. However, since the Pod is not directly created by your controller but indirectly through the ReplicaSet controller, the Pod's ownerReferences will only contain a single reference pointing to the ReplicaSet. As a result, it won't match appv1alpha1.MyCR{} and trigger reconciliation.

s-z-z avatar Jul 31 '25 07:07 s-z-z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 29 '25 08:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 28 '25 08:11 k8s-triage-robot