kale
kale copied to clipboard
a trivial question about kale (or argo) resource allocation
Hello, thank you for your great work
I faced the issue when I deployed my own pipeline with kale. it's a small gan network, trained with 10k images 256x256 I just executed the jupyter notebook, and it worked fine with the notebook on kubeflow 4cpu and 8Gi memory are allocated to the notebook
but in the meantime I got the pipeline started with kale, the pod where train function is in it is killed with OOM
I found that, only 128Mi of memory is allocated to the pod in which the train function is allocated Limits: cpu: 1 memory: 2Gi Requests: cpu: 100m memory: 128Mi
Can I rearrange the size of memory to be allocated to the pod before the pipeline gets run? Is there any way I can fix the resource to be allocated to each pipeline pods with kale?
so the reason I found why the resources are allocated like I mentioned above is that "limitrange" of k8s is set to do so
Since I set the default limit and request values to pods all the kale operations are in a fixed resource quota Therefore it's still not available to allocate different size of the cpu and memory to each pipeline components.