hail icon indicating copy to clipboard operation
hail copied to clipboard

[batch] Batch charges for private instance creation that fails with exhausted resource errors.

Open cseed opened this issue 9 months ago • 0 comments

What happened?

Due to limited GPU availability, it is common for GPU private jobs (esp. preemptible) to fail multiple times with exhausted resource errors before obtaining a VM. When this happens, Batch still changes for the attempt. An example is batch 8166586, job 1, attempt ZMkGaS, instance ID batch-worker-default-job-private-u4fxc which failed with ZONE_RESOURCE_POOL_EXHAUSTED.

Version

SaaS

Relevant log output

No response

cseed avatar Apr 26 '24 16:04 cseed