skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

[Serve] Change back from autodown to autostop

Open cblmemo opened this issue 2 months ago • 2 comments

#3377 changes the autostop for skyserve controller to autodown, which will teardown the controller when the sky serve controller job exited unexpectedly and remove any related replica information/logs. This PR changes it back to autostop to preserve the info.

Tested (run the relevant ones):

  • [ ] Code formatting: bash format.sh
  • [ ] Any manual or new tests for this PR (please specify below)
  • [ ] All smoke tests: pytest tests/test_smoke.py
  • [ ] Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • [ ] Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

cblmemo avatar May 10 '24 06:05 cblmemo

For discussion, this seems fine to stop serve controller, since we have made the controller on kubernetes to skip the autostop configuration in #3521. Wdyt @romilbhardwaj?

@cblmemo please check #3521 #3524 #3525, we need to make sure the serve controller will skip the autostop setting to allow the serve controller to run on kubernetes

Michaelvll avatar May 10 '24 16:05 Michaelvll

Yes, should be okay if we skip autostop similar to #3521. So IIUC, the behavior is going to be:

Serve and jobs controller:

  • On k8s - run indefinitely
  • On other clouds - autostop after configured time

Let's go with this for now and reevaluate the "run indefinitely" based on user feedback.

romilbhardwaj avatar May 10 '24 17:05 romilbhardwaj

bumping for this @Michaelvll @romilbhardwaj - are there any changes I need to make for this PR? IIUC it will automatically skip the autostop for k8s controller?

cblmemo avatar May 14 '24 06:05 cblmemo