skypilot issues

Results 530 skypilot issues

Sort by recently updated

[AWS] Use single zone for each provisioning

With the new provisioner, we are able to fail over different availability zones pretty fast. Thus, we do not need to assign multiple zones for AWS during each provisioning attempt....

suquark

Stale

[Spot pipeline] Feature requests for spot pipeline

- [x] Docs for the spot pipeline - [ ] Story for how to debug the pipeline. Currently, if the 4th task failed, the user has to restart the entire...

Michaelvll

enhancement

Stale

I get permitrootlogin error on fludstack.

RuntimeError: Failed to SSH to 38.80.122.92 after timeout 600s, with Error: /etc/ssh/ssh_config: line 26: Bad configuration option: permitrootlogin _Version & Commit info:_ * `sky -v`: skypilot, version 0.5.0 * `sky...

qashzar

[Azure] SkyPilot provisioner for Azure

~~Blocked by #3696, #3700~~ ## Single-node **master** ([05ce5e9](https://github.com/skypilot-org/skypilot/commit/05ce5e999a5c4218d267481ebddac7967dce1897)) ``` multitime -n 5 sky launch --cloud azure -y --cpus 2 --down Mean Std.Dev. Min Median Max real 220.920 6.553 213.297 219.030...

Michaelvll

Feature Request OVH public cloud

Please consider implementing this for compute instances provided by OVH public cloud. Although they do not provide spot instances, but the limited edition instances by OVH can be used as...

k-e-r-n-e-l-p-a-n-i-c

clouds

[100-jobs/Spot] More efficient batch spot jobs submission

Currently, we have a lock for each submission of the spot job, we should make it more efficient. One way to test this is to submitting more than 100 spot...

Michaelvll

friction-log

spot

[k8s] Fail to install package in the base conda env on k8s default image

To reproduce: `sky launch -c test-k8s --cloud kubernetes "conda install -c conda-forge google-cloud-sdk" -y`

Michaelvll

bug

[API] Better way to get IP for clusters and endpoints for service

Currently, to get the IP of a cluster in the python API is rather complicated: ```python ip = sky.status('cluster-name')[0]['handle'].external_ip() ``` Similarly for the endpoint of service: ```python service_records = sky.serve.status('code-llama')...

Michaelvll

feature-request

[GPU] Add support for AMD GPUs

We should consider adding support for AMD GPUs, which have been tested to be efficient for ML workloads. References: https://www.amd.com/en/technologies/deep-machine-learning https://www.lamini.ai/blog/lamini-amd-paving-the-road-to-gpu-rich-enterprise-llms https://blog.mlc.ai/2023/08/09/Making-AMD-GPUs-competitive-for-LLM-inference https://www.mosaicml.com/blog/amd-mi250

Michaelvll

enhancement

[Storage] Precheck the existence of file mounts.

The following command will go ahead to launch the cluster, but fail after the cluster is launched. We should check the existence of the source in filemounts, before launching the...

Michaelvll

friction-log

skypilot
skypilot copied to clipboard

Metadata

[AWS] Use single zone for each provisioning

[Spot pipeline] Feature requests for spot pipeline

I get permitrootlogin error on fludstack.

[Azure] SkyPilot provisioner for Azure

Feature Request OVH public cloud

[100-jobs/Spot] More efficient batch spot jobs submission

[k8s] Fail to install package in the base conda env on k8s default image

[API] Better way to get IP for clusters and endpoints for service

[GPU] Add support for AMD GPUs

[Storage] Precheck the existence of file mounts.

← Metadata

Owner

Metadata

skypilot skypilot copied to clipboard

Metadata

← Metadata

Owner

Metadata

skypilot
skypilot copied to clipboard