onnxruntime_backend icon indicating copy to clipboard operation
onnxruntime_backend copied to clipboard

default-max-batch-size doesn't cooperate well with preferred_batch_size

Open OvervCW opened this issue 3 years ago • 0 comments
trafficstars

Description We're using --backend-config=onnxruntime,default-max-batch-size=128 to enable large client side batches for all of our models, however we want to limit dynamic batches to a much lower limit for more predictable inference latency for some of our models. To that end we've added the following to some of the model configs:

dynamic_batching { preferred_batch_size: [2] }

However, Triton shows the following error during startup:

dynamic batching preferred size must be <= max batch size

(https://github.com/triton-inference-server/core/blob/c9cd6630ecb04bb26e2110cd65a37f23aec8153b/src/model_config_utils.cc#L1195-L1200)

If we explicitly add max_batch_size: 128 to every model config then the error disappears.

I think the problem is that the relevant autocomplete code doesn't write the default max batch size back to the model config:

https://github.com/triton-inference-server/onnxruntime_backend/blob/1e5dd03fd18992446fd169ead8bf208e8fc53686/src/onnxruntime.cc#L769-L785

I realize that this is a bit of an uncommon use case (and very easy to work around), but the error message is a bit confusing and unexpected. Is this intended behavior? Could it be fixed by writing back the default max batch size to the model config during autocompletion as if it had been specified in the first place?

Triton Information What version of Triton are you using? r22.08

Are you using the Triton container or did you build it yourself? Built it myself.

To Reproduce See above.

Expected behavior A clear and concise description of what you expected to happen.

OvervCW avatar Sep 28 '22 13:09 OvervCW