dask-kubernetes icon indicating copy to clipboard operation
dask-kubernetes copied to clipboard

Documentation notes for dask-operator, and service account permission problems

Open zulissi opened this issue 2 years ago • 1 comments

Thanks for the great work on the dask-operator!

Three quick notes from trying it on our cluster:

  • The scaling of workers assumes that each worker pod only has one dask process. If there are multiple processes the worker-names (in dask) are something like ...-ef8274c-0,...-ef8274c-1 to mark which process they belong to, and scaling down will fail as the pod name will be ...-ef8274c, so scaling down will fail. A note in the documentation that one process per pod is required would be helpful. If you have multiple processes daskworkergroup scale-up works but scale-down fails.
  • If you use NodePort (as in the examples), the default service account created during the latest helm install doesn't have permission to list nodes and can lead to errors. ClusterIP works great.
  • The kubeflow patch script (kubectl patch clusterrole kubeflow-kubernetes-edit --patch '{"rules": [{"apiGroups": ["kubernetes.dask.org"],"resources": ["*"],"verbs": ["*"]}]}') over-writes the kubeflow permissions rather than adding the dask permissions. I'm not sure what the right way to patch-by-adding is.

zulissi avatar Jun 03 '22 11:06 zulissi

Thanks for raising this! All useful feedback, I'll look into it.

jacobtomlinson avatar Jun 06 '22 16:06 jacobtomlinson