skypilot
skypilot copied to clipboard
UX/backend: Lower-case GPU succeeds in launching, but block forever in exec
Adapted from a programmatic use case from Erick. Here's a CLI repro:
# Succeeds. Because we allow lower-case GPUs in launching.
sky launch -c myclus --gpus v100 ''
# This blocks forever. `sky queue` will show PENDING.
# This is because [v100:1] does not fit [V100:1].
sky exec myclus --gpus v100 -- echo hi
# Works.
sky exec myclus --gpus V100 -- echo hi
An immediate fix is to canonicalize the gpu string during exec.
A bigger item is probably to programmatically check if some task requirement is not ever going to be satisfied, e.g., exec --gpus some_other_gpu
, and immediately fail.
#1075 will fix this bug.
A bigger item is probably to programmatically check if some task requirement is not ever going to be satisfied, e.g., exec --gpus some_other_gpu, and immediately fail.
Actually sky exec --gpus some_other_gpus
immediately fails by the less_demanding_than
test. However, yes we may need to investigate in more detail whether something other than the --gpus
argument can make a task requirement unsatisfiable.