kale icon indicating copy to clipboard operation
kale copied to clipboard

Compatibility with profile resource restrictions on Kubeflow

Open szymek116 opened this issue 3 years ago • 3 comments

When using Kale with Kubeflow profile that has CPU or mem restrictions (https://www.kubeflow.org/docs/components/multi-tenancy/getting-started/#manual-profile-creation) after pipeline run first step fails with:

This step is in Error state with this message: pods "test-ewe2j-t22kv-1073397324" is forbidden: failed quota: kf-resource-quota: must specify cpu,memory

It seems Kale submits container without limits which is blocked by KF. Any workaround for this ? we are using Kale 0.6 and KF 1.1

szymek116 avatar Jun 09 '21 13:06 szymek116

I got a tricky way, when you save the pipeline.yaml, you can add something like below in the titanic-ml.kale.py to restrict the quota of the pod. _kale_step_limits = {'nvidia.com/gpu': '1'} for _kale_k, _kale_v in _kale_step_limits.items(): _kale_loaddata_task.container.add_resource_limit(_kale_k, _kale_v) this is the usage of the GPU and it can transform into cpu and memory as well.

brness avatar Aug 13 '21 12:08 brness

generally we can add those limits as per this directly in pipeline files: https://github.com/kubeflow/pipelines/pull/5695

But I guess idea for Kale is that user don't have to mess with code in yaml or py file. If somebody will point me to the place in code where i can modify yaml for executed pod it i can try to make a patch for it.

szymek116 avatar Sep 13 '21 10:09 szymek116

but as you can see, it support the resource of GPU, how it can not be applied with cpu and memory. That just does not make any sense, Maybe it was not meant for multi user scen. So we can only fix it by modifing the source code

brness avatar Sep 24 '21 03:09 brness