typewriter
typewriter copied to clipboard
Distributed coach stalls if number of workers is greater than number of available vCPUs.
In order for k8s not to put all pods on the same node I gave resource restrictions on pods with this code in kubernetes_orchestrator.py
resources=k8sclient.V1ResourceRequirements( requests={'cpu':'1'} ),
It works if I give a num_workers < vCPUs
, stalls otherwise since there are pending pods to be created. Is this by design with the worker locks? What's the recommended approach?