core icon indicating copy to clipboard operation
core copied to clipboard

Fix double slot release after request cancellation

Open HennerM opened this issue 1 year ago • 1 comments

It looks like a regression got introduced with https://github.com/triton-inference-server/core/pull/273.

Cancelled requets release a slot in the sequence batcher, but the very same request still get reaped after idle which leads to the same slot (now potentially already taken by another sequence) to be released again. This can have severe consequences when using the implicit state management, because the state is associated with the slot.

This is the root cause for the problem described in https://github.com/triton-inference-server/server/issues/7117.

I have tested this change against a very simple example with the gRPC Python client and a simple sequence model that has max_candidate_sequences 2. I sent two requests, one that is currently executed and another that is queued up in the sequence batcher, now I cancel request two and wait for max_sequence_idle_microseconds. Now I send two new sequences and see in the logs that they both get assigned slot 0

HennerM avatar Apr 18 '24 00:04 HennerM

Thanks for the submission @HennerM ! We're looking into this PR and the underlying root causes or edge cases throughout the Sequence Batch Scheduler.

rmccorm4 avatar Apr 22 '24 21:04 rmccorm4