BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

Doubt regarding GPU utilization and batch inference

Open BakingBrains opened this issue 1 year ago • 2 comments

  1. Can BentoML auto allocate the GPU resources based on the incoming requests in production? any code references.
  2. Also, how can I do a batch inference for sequence of images? Like I want to send 8 images at once, any code references.

Any suggestion would be helpful.

Thanks and Regards.

BakingBrains avatar Jul 10 '23 13:07 BakingBrains

  1. For autoscaling, please take a look if BentoCloud is what you need.
  2. BentoML currently does not support sequence of images. We are looking to add image sequence IO descriptor in the future. Currently, could you convert the images to numpy and use the ndarray IO descriptor?

ssheng avatar Jul 12 '23 09:07 ssheng

If you're using pytorch, you can also convert the images to pytorch.Tensor and BentoML supports batching those.

visaals avatar Aug 18 '23 19:08 visaals