kfctl
kfctl copied to clipboard
kfctl implements reconcile semantics
/kind feature
Why you need this feature: See design doc for kfctl v1: kubeflow/kubeflow#3709
One of the proposal is for kfctl to have reconcile like implementation to deal with complex ordering of operations.
Filing this issue to track implementation.
Issue-Label Bot is automatically applying the label kind/feature to this issue, with a confidence of 0.93. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!
Links: app homepage, dashboard and code for this bot.
Bump to P0 because this must land in 0.7
/assign @gabrielwen
@gabrielwen Any update on this do you have an ETA?
Based on some of our discussions
- kubeflow/kubeflow#4106 uses errors to determine whether apply is complete or needs to be reinvoked
- I think we want to change that so we start using conditions so that we don't just swallow errors and keep retrying for ever
- I think we need to do that as part of kubeflow/kubeflow#4141 moving pod default creation into the gcp plugin
- We don't want the pod default creation logic to just return an error if we haven't created the kubeflow namespace yet; instead we should set conditions such that the coordinator.apply knows that its not complete and needs to be reinvoked
- We will likely want to add conditions to track that GCP deployment is done so that we can skip it the next time we invoke gcp.go otherwise it will slow things down
- We might want to keep track of the time we last checked GCP deployment so that if it exceeds some amount of time on reapply it would end up get reinvoked
- We can do that in a follow on PR
@jlewi
- kubeflow/kubeflow#4166 provides functionality of using conditions with KfDef.
- after kubeflow/kubeflow#4166 I'm going to issue another PR to use conditions during Reconcile.
- after the PR mentioned above, kubeflow/kubeflow#4141 will be unblocked as well as other PRs need reconcile semantics.
- level based executions not started but I think it's not 0.7 release blocking, is it?
Discussed offline and this is implementation detail to KFCTL. Since this is not affecting semantics we have to roll out in v0.7, this is not release blocking. making it p1 instead of p0.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
reconcile semantics will be needed for the kfctl operator; see kubeflow/kubeflow#4570.
/lifecycle frozen Related to kubeflow/kfctl#193 operator roadmap
Issue-Label Bot is automatically applying the labels:
| Label | Probability |
|---|---|
| kind/feature | 0.93 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.
Bump to P1. We need this to better handle errors like kubeflow/manifests#806
Right now if a single resource (e.g. certmanager) takes a long time to start we wait until it is ready before we continue. I think that is undesirable. We'd prefer to keep deploying the other applying actions and then retry.
Reconcile semantics should accomplish this.