server icon indicating copy to clipboard operation
server copied to clipboard

Enhance OTEL testing to capture and verify Cancellation Requests and Non-Decoupled model inference.

Open indrajit96 opened this issue 1 year ago • 5 comments

Added tests for

  1. For request cancellations, make sure spans are ended properly in various scenarios
  2. Verify tracing behavior, after this PR: https://github.com/triton-inference-server/server/pull/6017

Tests added

  1. test_non_decoupled() : Verify tracing for a non-decoupled inference
  2. test_grpc_trace_all_input_required_model_cancel() : Verify trace after an inference request is cancelled in COMPUTE Phase
  3. test_grpc_trace_model_cancel_in_queue() : Verify trace after an inference request is cancelled in QUEUE before COMPUTE Phase

indrajit96 avatar Apr 18 '24 05:04 indrajit96

Could you please add a description, clarifying what tests were added?

oandreeva-nv avatar Apr 19 '24 18:04 oandreeva-nv

Could you please attach a picture of a trace as displayed in jaeger with cancelled request.

Another question, have you considered cases when request was cancelled, when it was in a queue and when it was already in a compute stage?

oandreeva-nv avatar Apr 19 '24 18:04 oandreeva-nv

Could you please attach a picture of a trace as displayed in jaeger with cancelled request.

Another question, have you considered cases when request was cancelled, when it was in a queue and when it was already in a compute stage?

Could you please attach a picture of a trace as displayed in jaeger with cancelled request.

Another question, have you considered cases when request was cancelled, when it was in a queue and when it was already in a compute stage?

Screenshot 2024-05-02 at 10 51 00 AM

Fixed this, added a model which waits for a delay before executing hence cancellation request is recieved before the execution starts

indrajit96 avatar May 06 '24 17:05 indrajit96

Can you update the PR title to be more descriptive? (cancellation, decoupled responses, etc. rather than JIRA ticket number)

rmccorm4 avatar May 08 '24 21:05 rmccorm4

I think it's in a good state. Let's address Ryan's comment and provide a clear description for 3 tests, added in this PR, to the PR description

oandreeva-nv avatar May 21 '24 00:05 oandreeva-nv