serve icon indicating copy to clipboard operation
serve copied to clipboard

If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing?

Open pengxin233 opened this issue 9 months ago • 1 comments

📚 The doc issue

I set the batchSize of the registered model to 10, and then set the micro_batch_size to 1. So for model inference, will it wait for 10 requests to complete preprocessing in parallel before aggregating them for inference?

Suggest a potential alternative/fix

No response

pengxin233 avatar Apr 29 '24 02:04 pengxin233

Hi @pengxin233 yes, it will still aggregates 10 requests (or wait until max batch delay) to perform the inference. The inference method of the handler will only see a single request at a time but the pre and post processing will run in parallel with the inference. So it depends on your use case (is your handler dominated by pre/post-processing?) if this configuration is performant.

mreso avatar Apr 29 '24 18:04 mreso