typewriter icon indicating copy to clipboard operation
typewriter copied to clipboard

Distributed coach stalls if number of workers is greater than number of available vCPUs.

Open geranim0 opened this issue 5 years ago • 2 comments

In order for k8s not to put all pods on the same node I gave resource restrictions on pods with this code in kubernetes_orchestrator.py

resources=k8sclient.V1ResourceRequirements( requests={'cpu':'1'} ),

It works if I give a num_workers < vCPUs, stalls otherwise since there are pending pods to be created. Is this by design with the worker locks? What's the recommended approach?

geranim0 avatar Apr 30 '19 17:04 geranim0