skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

[Serve][UX] Fine-grained reason for a replica failure

Open cblmemo opened this issue 2 years ago • 1 comments

Currently our recommendation to users is to check logs if the replica fails, but it might be easier to understand/debug if we can clearly state in the sky status output.

Potential solutions:

  • More replica status, e.g. RUN_FAILED, SETUP_FAILED;
  • Add a column to indicate failure reason in a plain text string.

cblmemo avatar Feb 18 '24 07:02 cblmemo

you can run sky serve logs <service_name> <replica_id> to see the logs of the failing replica, but i agree that skypilot should have a status for failed build or stop creating new replicas if a max number of attempts was made

brian316 avatar Apr 17 '24 23:04 brian316

Already resolved by #3411. Closing now

cblmemo avatar Jun 21 '24 12:06 cblmemo