Zhanghao Wu
                                            Zhanghao Wu
                                        
                                    Our autostop will only consider the run section, but the setup section can take very long and the cluster can be stopped even if the setup is still running.
Currently, we cache the spot job status locally whenever the user called `sky spot status`, which will return a bit confusing job table as some of the jobs may still...
Although `aws s3 ls s3://imagenet-bucket` can list the files in the bucket (not owned by me), when mounting the bucket with the following YAML, raises the AccessDenied Error. ```yaml resources:...
I am testing transferring a tar file from a bucket to ebs. When using the `aws s3 cp`, the throughput keeps around 200MB/s. When I mount the bucket with our...
We may want to expose our python APIs, make them easier to use, and add them to our document. It could be related to #871.
Some inference tasks can be run on multiple different resources, e.g. V100:8, V100:4, or V100:1. Based on the availability, the user wants the sky to failover from 8 to 1....
To test out how well goofys performs for a real deep learning workload, I made this benchmark. ## ImageNet Dataset Information ### Stats * 1M training images, and 50K val...
A GCP user with no IAM setting permission (the permission in the following figure) will not be able to get setIAMPolicy for the ray-autoscaler service IAM and cause the following...
Our current optimizer does not consider regions. That makes the data movement between different regions not taken into consideration, especially when we are trying to have chain dags or recovery...
It would be great to have a tutorial in our document for using TPU nodes and TPU VMs in our document. Currently, we only have a single line in the...