cli-utils
cli-utils copied to clipboard
Bug: Inventory updates should tolerate drift (and overwrite it)
Right now, inventory updates may return a conflict error from Kubernetes. The inventory client should detect this (apierrors.IsConflict(err)) and retry with a new Get (to update the ResourceVersion) + Update.
Example retry code:
type retriable func(ctx context.Context) (retry bool, err error)
func retryWithBackoff(ctx context.Context, timeout time.Duration, fn retriable) error {
var err error
var retry bool
ctx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
delay := 1 + time.Second
for {
// attempt to update status
retry, err = fn(ctx)
if !retry {
return err
}
// wait until delay or timeout
timer := time.NewTimer(delay)
select {
case <-ctx.Done():
timer.Stop()
return fmt.Errorf("timed out after retrying for %v: %w", timeout, err)
case <-timer.C:
// continue
}
// retry backoff
delay = delay * 2
}
}
example usage:
// attempt to update status until timeout
ctx := context.TODO()
timeout := 1 * time.Minute
return retryWithBackoff(ctx, timeout, func(ctx context.Context) (retry bool, err error) {
// Get the object to get the latest ResourceVersion.
latestObj, err := resource.Get(ctx, obj.GetName(), metav1.GetOptions{TypeMeta: meta})
if err != nil {
return false, fmt.Errorf("failed to get inventory status from cluster: %w", err)
}
// Ignore any status changes made remotely.
// This update will replace them.
obj.SetResourceVersion(latestObj.GetResourceVersion())
_, err = resource.UpdateStatus(ctx, obj, metav1.UpdateOptions{TypeMeta: meta})
if err != nil {
// retry if conflict
return apierrors.IsConflict(err), fmt.Errorf("failed to write updated inventory status to cluster: %w", err)
}
return false, nil
})
Another option is to use https://github.com/flowchartsman/retry which is nice and generic. gcloud and client-go also have retry libs.
The main client causing drift right now is the Config Sync resource-group-controller, which updates the ResourceGroup (inventory) status.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten /lifecycle frozen