skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

sky cancel not reliable for program using multiprocessing and ray

Open Michaelvll opened this issue 2 years ago • 5 comments

We have a script that uses ray to schedule multiple workers on the VM, and each worker can run some multiprocessing ILP solver. When I sky cancel the job, it is unreliable to kill the jobs. I have to manually log into the cluster and kill the processes and sometimes, I have to ray stop and sky launch again.

Michaelvll avatar Apr 25 '23 23:04 Michaelvll

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Aug 24 '23 01:08 github-actions[bot]

This is still relevant and a related issue is #2340

Michaelvll avatar Aug 24 '23 02:08 Michaelvll

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Dec 23 '23 02:12 github-actions[bot]

This issue was closed because it has been stalled for 10 days with no activity.

github-actions[bot] avatar Jan 02 '24 02:01 github-actions[bot]

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar May 02 '24 01:05 github-actions[bot]

This issue was closed because it has been stalled for 10 days with no activity.

github-actions[bot] avatar May 12 '24 01:05 github-actions[bot]