Zhanghao Wu
Zhanghao Wu
 The problem may caused by clock misalign between the local server and the spot controller.
Although not much CPU and memory are used, the `sky-spot-controller` can still fail to take new `ray job` commands, due to a lot of `ray job` commands running and the...
Fixes #1073. This PR makes sure that a normal task YAML can be run with `sky spot launch` without any modification. It is currently blocked by #1069. Tested: - [...
Our sky storage does not support uploading a single file to the sky storage with the following yaml ``` file_mounts: /test-path: name: sky-test-tmp source: ~/file.txt store: gcs ``` That is...
The following yaml will upload the `~/tmp` to the S3 bucket, instead of deciding by the cloud specified in resources. ``` resources: cloud: gcp file_mounts: /test-path: name: sky-storage-test-tmp source: ~/tmp...
Currently, we have both `Task.set_storage_mounts` and `Task.set_file_mounts`. It is a burden for the user of programmatic API to manually distinguish the two mounts. Instead, we can align it with our...
The current `sky.Storage` API as following is misaligned with our YAML specs, making it hard to learn. ``` Storage(self, name: Optional[str] = None, source: Optional[Path] = None, stores: Optional[Dict[StoreType, AbstractStore]]...
Currently, we will launch 16 spot jobs in parallel and let the remaining jobs pending in a queue. This becomes a problem when the quota of the user does not...
The friction of changing an original available sky yaml with `workdir` and local file mounts set to the managed spot yaml has been observed multiple times. We may want to...