Zhanghao Wu
Zhanghao Wu
It seems the problem is caused by the following lines. Based on the comment in the ray code ` # In some managed environments, an admin needs to grant the...
After adding the following permission to the ray-autoscaler IAM account manually by the adim, sky can now create new instance. We may want to add this to our FAQ in...
Good catch! Seems the problem is caused by the dash in the `"-m sky.skylet.skylet"`? Maybe we can add a escape backslash for it, e.g. `"\-m sky.skylet.skylet"`? https://stackoverflow.com/a/70897535
I do not think it is very urgent and I believe this requires quite many code changes in the backend logic as well, but feel free to take it. ;...
Great finding @romilbhardwaj ! I just tried it out with infinite `_NODES_LAUNCHING_PROGRESS_TIMEOUT`, and it seems that `ray status` takes 6 minutes to find out the first 10 workers have been...
Another problem found when trying out this issue. Our `sky down cluster_name` for the 50 node cluster does not work well producing the following error. ``` E 10-16 22:01:45 cloud_vm_ray_backend.py:2545]...
This problem should have been mitigated by #1150. I am closing it for now, but feel free to re-open this issue if the problem occurs again.
Yes... current `start` will not mount the s3 bucket again. Please try to run ~~`sky launch --no-setup -y`~~ `sky launch cluster.yml -c g4 --no-setup`.
This problem was recently brought up by @lhqing as well, in #1157. TODO: Add the `sky launch --no-setup` in the doc for the storage mount.
> Thanks! Just to make sure, I should do `sky launch cluster.yml -c g4 --no-setup` in this case? Ahh, yes. Sorry for the incomplete command.