BentoML
                                
                                 BentoML copied to clipboard
                                
                                    BentoML copied to clipboard
                            
                            
                            
                        feat: Runner to GPUs mapping
Feature request
The default scheduling strategy implementation schedules the same number of runner (nvidia.com/gpu supported) instances as the number of available GPUs. If multiple types of runners are present in the service, they will each be scheduled with the same number of instances as the available GPUs. Ideally, we should enable device isolation and allow users to configure GPU device affinity to the runner types.
Motivation
Allowing runner to GPUs mapping can reduce the number of runner instances required while maximizing GPU resource utilization.
Other
No response