serve
serve copied to clipboard
If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing?
📚 The doc issue
I set the batchSize of the registered model to 10, and then set the micro_batch_size to 1. So for model inference, will it wait for 10 requests to complete preprocessing in parallel before aggregating them for inference?
Suggest a potential alternative/fix
No response
Hi @pengxin233 yes, it will still aggregates 10 requests (or wait until max batch delay) to perform the inference. The inference method of the handler will only see a single request at a time but the pre and post processing will run in parallel with the inference. So it depends on your use case (is your handler dominated by pre/post-processing?) if this configuration is performant.