Queue timeouts not working as expected

Open sboudouk opened this issue 1 year ago • 0 comments

Description A clear and concise description of what the bug is.

Timeout value defined in config.pbtxt is not triggered on defined value, but after the model has finished its current inference.

default_timeout_microseconds set to 10000000 (10s) -> Request A arrives -> Request A being infered by the model and takes 90s -> Request B arrives -> Request B hangs on the queue for 90s until Request A is done before getting timed out.

Triton Information What version of Triton are you using?

24.05

Are you using the Triton container or did you build it yourself?

Triton Container

To Reproduce Steps to reproduce the behavior.

default_timeout_microseconds set to 10000000 (10s) -> Request A arrives -> Request A being infererd by the model and takes 90s -> Request B arrives -> Request B hangs on the queue for 90s until Request A is done before getting timed out.

Whatever model, just put a sleep that is > to default_timeout_microseconds in the execute method , and you will see that no request timeout is triggered.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Python backend, input and outputs have no importance here, but it's a configuration like that :

  name: "my_model"
  backend: "python"
  max_batch_size: 1
  dynamic_batching: {
      default_queue_policy: {
          timeout_action: REJECT
          default_timeout_microseconds: 10000000
      }
  }

Expected behavior A clear and concise description of what you expected to happen.

Request in queue should timeout accordingly to the default_timeout_microseconds, even if the model is busy with another request.

Aug 09 '24 09:08 sboudouk