rancher icon indicating copy to clipboard operation
rancher copied to clipboard

The Rancher package Summary's private field may cause dead loop when it is used as a part of Status

Open w13915984028 opened this issue 2 years ago • 4 comments

Rancher Server Setup

  • Rancher version: v2.6.3

  • Installation option (Docker install/Helm Chart): The packet Summary is used by fleet-agent of Rancher

    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc):
  • Proxy/Cert Details:

    This BUG is related to one of the base package from rancher: Summary And it is also related to the Rancher auto generated soure code

Information about the Cluster

  • Kubernetes version:
  • Cluster Type (Local/Downstream):
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
    • If custom, define the set of permissions: Admin

The detailed reproduce steps and root cause analysis:

https://github.com/w13915984028/harvester-develop-summary/blob/main/fleet-agent-high-cpu-usage.md

Describe the bug

The important part of previous link is posted here:

type Summary struct {
	State         string                 `json:"state,omitempty"`
	Error         bool                   `json:"error,omitempty"`
	Transitioning bool                   `json:"transitioning,omitempty"`
	Message       []string               `json:"message,omitempty"`
	Attributes    map[string]interface{} `json:"-"`
	Relationships []Relationship         `json:"-"`
}

type Relationship struct {
	Name         string
	Namespace    string
	ControlledBy bool
	Kind         string
	APIVersion   string
	Inbound      bool
	Type         string
	Selector     *metav1.LabelSelector
}

The Summary is used as part of Status of BundleDeployment of Rancher-fleet.

But as Attributes and Relationships are private fields. It means, status.NonReadyStatus[].Summary.Relationships[] will not be updated to API Server side.

Further, let's check below auto generated code by Rancher framework, it gives the answer.

(a) status.NonReadyStatus[].Summary.Relationships[] causes equality.Semantic.DeepEqual(origStatus, &newStatus) return false

(b) a.condition.LastUpdated(&newStatus, time.Now().UTC().Format(time.RFC3339)) updates the lastUpdatedTime field.

(c) a.client.UpdateStatus(obj) updates the status to API Server, with the only changed filed lastUpdatedTime of Monitored in Condition

status is updated to API Server, it will trigger a new round of (a) (b) (c).

It is a dead/long loop before something is changed.

// auto-generated code
func (a *bundleDeploymentStatusHandler) sync(key string, obj *v1alpha1.BundleDeployment) (*v1alpha1.BundleDeployment, error) {
	if obj == nil {
		return obj, nil
	}

	origStatus := obj.Status.DeepCopy()
	obj = obj.DeepCopy()
	newStatus, err := a.handler(obj, obj.Status)
	if err != nil {
		// Revert to old status on error
		newStatus = *origStatus.DeepCopy()
	}

	if a.condition != "" {
		if errors.IsConflict(err) {
			a.condition.SetError(&newStatus, "", nil)
		} else {
			a.condition.SetError(&newStatus, "", err)
		}
	}
	if !equality.Semantic.DeepEqual(origStatus, &newStatus) {  // NOTE
		if a.condition != "" {
			// Since status has changed, update the lastUpdatedTime
			a.condition.LastUpdated(&newStatus, time.Now().UTC().Format(time.RFC3339))  // NOTE
		}

		var newErr error
		obj.Status = newStatus
		newObj, newErr := a.client.UpdateStatus(obj)
		if err == nil {
			err = newErr
		}
		if newErr == nil {
			obj = newObj
		}
	}
	return obj, err
}

To Reproduce

Follow: https://github.com/w13915984028/harvester-develop-summary/blob/main/fleet-agent-high-cpu-usage.md

Result

Expected Result

The private field should not break equality.Semantic.DeepEqual

Screenshots

Additional context

Related issues: https://github.com/harvester/harvester/issues/1893 https://github.com/harvester/harvester/issues/2245 https://github.com/rancher/fleet/issues/760

w13915984028 avatar Jun 20 '22 08:06 w13915984028

cc @guangbochen @rebeccazzzz

w13915984028 avatar Jun 20 '22 08:06 w13915984028

Transferred to team 3 since this is related to fleet.

jakefhyde avatar Jun 24 '22 18:06 jakefhyde

Hey, since no one asked, here is my opinion: https://github.com/harvester/harvester/issues/2245#issuecomment-1166152229

Or in other words, fix fleet (or the rancher code?) by one of the following:

  • correct the serialization hints/hacks foiling the comparison
  • correct the naive comparison by marshalling then unmarshalling objects prior to comparison

dweomer avatar Jun 25 '22 00:06 dweomer

Please NOTE:

This BUG, happened in the auto-generated code by Rancher framework. The equality.Semantic.DeepEqual is called in auto-generated code, and it is broken by hack code.

func (a *bundleDeploymentStatusHandler) sync
(key string, obj *v1alpha1.BundleDeployment) (*v1alpha1.BundleDeployment, error) {
...
  equality.Semantic.DeepEqual(...)
...

When the framework CAN NOT fix it, then it needs to be solved in each caller (e.g., manually remove the private fields before return back to framework generated code).

We even don't know who has already/will accidently use Summary in Status.

Or, there should be a dedicated, say GeneralSummary, for normal usage.

w13915984028 avatar Jun 29 '22 19:06 w13915984028

This should be fixed in Fleet by https://github.com/rancher/fleet/pull/990 which will be part of Fleet v0.5.0.

thardeck avatar Sep 30 '22 09:09 thardeck

Fixed in Fleet v0.5.0

kkaempf avatar Dec 16 '22 15:12 kkaempf

Reopening for testing.

bmdepesa avatar Dec 16 '22 15:12 bmdepesa

Tested by the Harvester team as part of https://github.com/harvester/harvester/issues/3150

bmdepesa avatar Mar 14 '23 14:03 bmdepesa