BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

feat: Runner to GPUs mapping

Open ssheng opened this issue 3 years ago • 0 comments

Feature request

The default scheduling strategy implementation schedules the same number of runner (nvidia.com/gpu supported) instances as the number of available GPUs. If multiple types of runners are present in the service, they will each be scheduled with the same number of instances as the available GPUs. Ideally, we should enable device isolation and allow users to configure GPU device affinity to the runner types.

Motivation

Allowing runner to GPUs mapping can reduce the number of runner instances required while maximizing GPU resource utilization.

Other

No response

ssheng avatar Jul 19 '22 09:07 ssheng