BentoML
BentoML copied to clipboard
Doubt regarding GPU utilization and batch inference
- Can BentoML auto allocate the GPU resources based on the incoming requests in production? any code references.
- Also, how can I do a batch inference for sequence of images? Like I want to send 8 images at once, any code references.
Any suggestion would be helpful.
Thanks and Regards.
- For autoscaling, please take a look if BentoCloud is what you need.
- BentoML currently does not support sequence of images. We are looking to add image sequence IO descriptor in the future. Currently, could you convert the images to numpy and use the ndarray IO descriptor?
If you're using pytorch, you can also convert the images to pytorch.Tensor
and BentoML supports batching those.