kfctl icon indicating copy to clipboard operation
kfctl copied to clipboard

Create dev mode instructions for operator deployment

Open Tomcli opened this issue 4 years ago • 15 comments

Currently, the operator always watching the Kubeflow resources to reconcile when something is missing. This is good for production environment, but not very friendly when we need to remove and test resources in our development and testing setup. It would be nice to have a dev_mode flag to disable the operator watcher for development.

/cc @moficodes

Tomcli avatar Dec 08 '20 18:12 Tomcli

How would the reconcilation be triggered then? (I.e. what would the operator do if not watch:) )

vpavlin avatar Dec 08 '20 18:12 vpavlin

I think the goal is to make the operator more like kfctl. With the dev mode operator is just wrapping kfctl running the command once and thats it.

its useful for quickly iterating and testing the operator deployment.

moficodes avatar Dec 08 '20 18:12 moficodes

I can take a look at it.

moficodes avatar Dec 08 '20 18:12 moficodes

/assign

moficodes avatar Dec 08 '20 18:12 moficodes

Why not just use kfctl then?

Or even better, use the operator-sdk tooling for development - https://github.com/operator-framework/getting-started#2-run-locally-outside-the-cluster

vpavlin avatar Dec 08 '20 18:12 vpavlin

This is coming from one of our users who doesn't have much experience as a devops. We probably don't have to disable all the watchers, we only want to disable the watcher for monitoring the k8s resources https://github.com/kubeflow/kfctl/blob/master/pkg/controller/kfdef/kfdef_controller.go#L119

Tomcli avatar Dec 08 '20 19:12 Tomcli

also, this is an opt-out feature, so it shouldn't change the behavior of the current operator deployment.

Tomcli avatar Dec 08 '20 19:12 Tomcli

Why not just use kfctl then?

Or even better, use the operator-sdk tooling for development - https://github.com/operator-framework/getting-started#2-run-locally-outside-the-cluster

For most of our users, kfctl is sufficient in this case. However, we have some users that are using window or have very little experience with terminal. So able to use operator for development would be nice for them.

Tomcli avatar Dec 08 '20 19:12 Tomcli

Can you help me to understand the use case again - maybe with more details? It sounds like there is a very specific case which would get treatment in the operator where it should rather be treated by educating the user(s).

vpavlin avatar Dec 08 '20 20:12 vpavlin

Since the default behavior for operator now is to reapply the kfdef if there a delete event from any kfctl resource, users that made changes to the Kubeflow deployment with kubectl edit instead of updating kubeflow/manifests will lose their configuration. I do agree educating the users is the right approach, but I'm seeing some users are afraid to use operator when they see a big learning curve for deployment.

I suggest only use this flag for users that are deploying Kubeflow by themselves in a dev setup. So those who are interested in the Kubeflow project will be more committed to learn about kustomize and kfdef to deploy Kubeflow with the operator in the right way.

Tomcli avatar Dec 08 '20 22:12 Tomcli

@Tomcli I don't think it is a good idea to override a normal operator workflow to satisfy a small set of users. Another option they can do as @tumido pointed out is to install the operator, install Kubeflow and then pull down the operator pod instance to 0. This will remove the operator pod watching and doing the reconcile function. I am absolutely not a fan of adding code that breaks the fundamental function of an operator.

nakfour avatar Dec 10 '20 19:12 nakfour

Thanks @nakfour, pull down the operator pod instance to 0 can be a good option. Then we probably want to add some instructions for:

  1. How to stop watching kubeflow deployment (using kubectl, k8s/ocp ui to cover different audiences)
  2. When to resume watching (e.g. deleting kubeflow, update kfdef)

Hopefully this way we should able to help out our users without changing the operator behaviors.

Tomcli avatar Dec 10 '20 20:12 Tomcli

I was looking for this issue and couldn't find it.😁

Precisely as @nakfour says. My experience with dev setup, when working on adjusting ODH components, I've found out that only either manual kfctl or scaling down the operator after the initial deploy gives me the control I need.

If you need to test the operator interaction with your kfdef, the best way is to let it operate. And if you need to manually modify the manifests after the initial deploy, you should pause the operator - scaling it down is by far the most easy option.

This way you also have control over the updated manifests from the repositories specified in kfdef since the operator holds the repository cache in the pods, so when you scale it up again, you have the most fresh manifests available.

I think, if you need to do manual adjustmets, you need to turn the autopilot off first.

tumido avatar Dec 10 '20 20:12 tumido

btw, @Tomcli this way the whole "dev mode" toggle experience can be as simple as this:

Disable operator

oc patch deployment opendatahub-operator -n openshift-operators -p '{"spec":{"replicas":0}}'

Enable operator

oc patch deployment opendatahub-operator -n openshift-operators -p '{"spec":{"replicas":1}}'

you can also alias it in you bash to something shorter, which makes it even more convenient to use. :slightly_smiling_face:

tumido avatar Dec 11 '20 09:12 tumido

Thanks @tumido, I can add these instructions to the kubeflow/website and close this issue.

Tomcli avatar Dec 11 '20 17:12 Tomcli