serve If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing?

If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing?

Open pengxin233 opened this issue 1 year ago • 1 comments

📚 The doc issue

I set the batchSize of the registered model to 10, and then set the micro_batch_size to 1. So for model inference, will it wait for 10 requests to complete preprocessing in parallel before aggregating them for inference?

Suggest a potential alternative/fix

No response

Apr 29 '24 02:04 pengxin233

Hi @pengxin233 yes, it will still aggregates 10 requests (or wait until max batch delay) to perform the inference. The inference method of the handler will only see a single request at a time but the pre and post processing will run in parallel with the inference. So it depends on your use case (is your handler dominated by pre/post-processing?) if this configuration is performant.

Apr 29 '24 18:04 mreso

serve serve copied to clipboard

If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing?

📚 The doc issue

Suggest a potential alternative/fix

serve
serve copied to clipboard