kale icon indicating copy to clipboard operation
kale copied to clipboard

a trivial question about kale (or argo) resource allocation

Open coldtomatojuice opened this issue 2 years ago • 1 comments

Hello, thank you for your great work

I faced the issue when I deployed my own pipeline with kale. it's a small gan network, trained with 10k images 256x256 I just executed the jupyter notebook, and it worked fine with the notebook on kubeflow 4cpu and 8Gi memory are allocated to the notebook

but in the meantime I got the pipeline started with kale, the pod where train function is in it is killed with OOM

I found that, only 128Mi of memory is allocated to the pod in which the train function is allocated Limits: cpu: 1 memory: 2Gi Requests: cpu: 100m memory: 128Mi

Can I rearrange the size of memory to be allocated to the pod before the pipeline gets run? Is there any way I can fix the resource to be allocated to each pipeline pods with kale?

coldtomatojuice avatar Sep 07 '21 06:09 coldtomatojuice

so the reason I found why the resources are allocated like I mentioned above is that "limitrange" of k8s is set to do so

Since I set the default limit and request values to pods all the kale operations are in a fixed resource quota Therefore it's still not available to allocate different size of the cpu and memory to each pipeline components.

coldtomatojuice avatar Sep 14 '21 00:09 coldtomatojuice