BentoML feature: Make Adaptative Batching algorithm customizable

Feature request

It would be very nice to be able to implement one's own batching logic.

Motivation

AFAIK, the adaptative batching algorithm functions as a blackbox. The parameters (max batch size and max latency) offer very limited control over it. For instance, I have a use case where the inputs may greatly vary in length and batching them together does not make sense. In some cases it is even slower than doing it sequentially.

I would love to be able to only batch together inputs that are close in length by writing my own logic.

Other

No response

May 23 '24 16:05 bruno-hays

Hi, if you are still interested in this, we have made some changes to the batching recently.

Now the batch might be split into smaller pieces to fit the max batch size.

For example, if the max_batch_size is 10 and we send 3 requests with size of [7, 7, 6] in receiving order, the batch engine will execute with size of 7+3 and 4+6 sequentially, the second request is split in to two parts with size of (3, 4).

Jun 19 '24 05:06 frostming

@frostming
Thanks for keeping me updated :) If I understand correctly your comment applies to when the client sends data in batches. In that case, the batches are now rebuilt to better fit the max batch size argument.

This is not the feature I wish existed. I would really like it if we had more options to customize the way batches are made, like for instance only batching together inputs that share a same metadata field. This is key to getting the best performance in some cases, like the example in my original post.

Maybe it is already possible to do something by subclassing the class that handles batching ? It is not mentionned in the documentation.

Jun 25 '24 13:06 bruno-hays