skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Results 453 skypilot issues
Sort by recently updated
recently updated
newest added

Bug report from Daniel: `nvidia-smi` doesn't work on `sky gpunode --cloud gcp`. ``` NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA...

This PR enables TPU Pod usage. To change from single TPU and TPU pod, user only needs to modify `accelerators: tpu-v2-8` to `accelerators: tpu-v2-32`. `sky launch` and `sky exec` will...

This PR adds the `vCPUs` column in the optimizer message (which was requested by Justin a while ago). ``` $ sky gpunode --gpus V100 I 08-14 03:51:35 optimizer.py:605] == Optimizer...

![1688621660755091_ pic_hd](https://user-images.githubusercontent.com/6753189/185202179-61866f38-5fb7-4d83-911d-8ca3c14dd14b.jpg) The problem may caused by clock misalign between the local server and the spot controller.

bug

Although not much CPU and memory are used, the `sky-spot-controller` can still fail to take new `ray job` commands, due to a lot of `ray job` commands running and the...

bug

Closes #977. `GetPublicAccessBlock` is no longer a reliable method to check if a s3 bucket is public. Buckets can be publicly readable yet not allow access to `GetPublicAccessBlock`. I'm not...

Fixes #1073. This PR makes sure that a normal task YAML can be run with `sky spot launch` without any modification. It is currently blocked by #1069. Tested: - [...

A TPU user mentioned that for TPUs, even with on-demand TPU, it will be killed at any time within every 2 days, and no logs can be found for the...

After some discussion, the conclusion from last week to align user and admin's python (for submitting Ray jobs, since the Ray cluster is launched via admin's python) is to add...