runbooks
runbooks copied to clipboard
Bubble up more information about Pod status in Model/Notebook/Server APIs
A common case will be that a Pod requires a GPU but the scheduler/autoscaler is unable to place the Pod on a Node that has one. We shouldnt require users to go digging through Pod statuses and events to find this information. I think it should be shown with kubectl get models (and get notebooks, get modelservers) under the CONDITION column... see for reference:
k get models
NAME READY CONDITION
facebook-opt-125m True BuiltAndPushed
my-model True BuiltAndPushed
Information about whether something is pending could probably come from the Job API. Perhaps a kubectl plugin would be better suited for further drill down… ie events and pod statuses (in order to avoid overactive reconcile loops and API updates).