kfctl icon indicating copy to clipboard operation
kfctl copied to clipboard

kfctl implements reconcile semantics

Open jlewi opened this issue 6 years ago • 11 comments
trafficstars

/kind feature

Why you need this feature: See design doc for kfctl v1: kubeflow/kubeflow#3709

One of the proposal is for kfctl to have reconcile like implementation to deal with complex ordering of operations.

Filing this issue to track implementation.

jlewi avatar Sep 04 '19 18:09 jlewi

Issue-Label Bot is automatically applying the label kind/feature to this issue, with a confidence of 0.93. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Sep 04 '19 18:09 issue-label-bot[bot]

Bump to P0 because this must land in 0.7

jlewi avatar Sep 09 '19 17:09 jlewi

/assign @gabrielwen

gabrielwen avatar Sep 10 '19 23:09 gabrielwen

@gabrielwen Any update on this do you have an ETA?

Based on some of our discussions

  • kubeflow/kubeflow#4106 uses errors to determine whether apply is complete or needs to be reinvoked
  • I think we want to change that so we start using conditions so that we don't just swallow errors and keep retrying for ever
  • I think we need to do that as part of kubeflow/kubeflow#4141 moving pod default creation into the gcp plugin
    • We don't want the pod default creation logic to just return an error if we haven't created the kubeflow namespace yet; instead we should set conditions such that the coordinator.apply knows that its not complete and needs to be reinvoked
  • We will likely want to add conditions to track that GCP deployment is done so that we can skip it the next time we invoke gcp.go otherwise it will slow things down
    • We might want to keep track of the time we last checked GCP deployment so that if it exceeds some amount of time on reapply it would end up get reinvoked
    • We can do that in a follow on PR

jlewi avatar Sep 20 '19 15:09 jlewi

@jlewi

  • kubeflow/kubeflow#4166 provides functionality of using conditions with KfDef.
  • after kubeflow/kubeflow#4166 I'm going to issue another PR to use conditions during Reconcile.
  • after the PR mentioned above, kubeflow/kubeflow#4141 will be unblocked as well as other PRs need reconcile semantics.
  • level based executions not started but I think it's not 0.7 release blocking, is it?

gabrielwen avatar Sep 25 '19 23:09 gabrielwen

Discussed offline and this is implementation detail to KFCTL. Since this is not affecting semantics we have to roll out in v0.7, this is not release blocking. making it p1 instead of p0.

gabrielwen avatar Oct 07 '19 22:10 gabrielwen

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 05 '20 23:01 stale[bot]

reconcile semantics will be needed for the kfctl operator; see kubeflow/kubeflow#4570.

jlewi avatar Jan 06 '20 14:01 jlewi

/lifecycle frozen Related to kubeflow/kfctl#193 operator roadmap

jlewi avatar Feb 03 '20 17:02 jlewi

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.93

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

issue-label-bot[bot] avatar Feb 03 '20 17:02 issue-label-bot[bot]

Bump to P1. We need this to better handle errors like kubeflow/manifests#806

Right now if a single resource (e.g. certmanager) takes a long time to start we wait until it is ready before we continue. I think that is undesirable. We'd prefer to keep deploying the other applying actions and then retry.

Reconcile semantics should accomplish this.

jlewi avatar Feb 04 '20 13:02 jlewi