BentoML Doubt regarding GPU utilization and batch inference

Doubt regarding GPU utilization and batch inference

Open BakingBrains opened this issue 1 year ago • 2 comments

Can BentoML auto allocate the GPU resources based on the incoming requests in production? any code references.
Also, how can I do a batch inference for sequence of images? Like I want to send 8 images at once, any code references.

Any suggestion would be helpful.

Thanks and Regards.

Jul 10 '23 13:07 BakingBrains

For autoscaling, please take a look if BentoCloud is what you need.
BentoML currently does not support sequence of images. We are looking to add image sequence IO descriptor in the future. Currently, could you convert the images to numpy and use the ndarray IO descriptor?

Jul 12 '23 09:07 ssheng

If you're using pytorch, you can also convert the images to pytorch.Tensor and BentoML supports batching those.

Aug 18 '23 19:08 visaals