toil icon indicating copy to clipboard operation
toil copied to clipboard

Add an argument to specify number of GPUs to use for a `toil-wdl-runner` task

Open stxue1 opened this issue 9 months ago • 1 comments

WDL 1.1 says that tasks can be specified to need GPUs:

task gpu_test {
  #.....
  runtime {
    gpu: true
  }
}

The field is a boolean value, and we're supposed to provide an argument to specify the number of GPUs needed:

This attribute cannot request any specific quantity or types of GPUs to make available to the task. Any such information should be provided using an execution engine-specific attribute.

The closest that I think we have is --defaultAccelerators but it ignores the wanted batch system. For example, with --batchSystem=slurm --defaultAccelerators=1:

[2024-05-22T18:39:35-0700] [MainThread] [C] [toil.wdl.wdltoil] Could not run workflow because:

🚨🚨🚨
The job 'WDLRootJob' kind-WDLRootJob/instance-2a0q47k9 v1 is requesting [{'count': 1, 'kind': 'gpu'}] accelerators, more than the maximum of [] accelerators that SingleMachineBatchSystem was configured with. The accelerator {'count': 1, 'kind': 'gpu'} could not be provided. Scale is set to 1.
🚨🚨🚨

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1576

stxue1 avatar May 23 '24 01:05 stxue1