server icon indicating copy to clipboard operation
server copied to clipboard

why can not cancel the request in the first model of ensemble_model

Open eeeeeunjung opened this issue 1 year ago • 6 comments

Description i have a ensemble model, and two sub model with python backend, i add code like this in my two sub model, the models are one instance,and max batch size is 1,

    def execute(self, requests=None):
        responses = []
        for request in requests:
            if request.is_cancelled():
                responses.append(
                    pb_utils.InferenceResponse(
                        error=pb_utils.TritonError("Request Cancelled", pb_utils.TritonError.CANCELLED)
                    )
                )
                continue

and i use python grpc to send three requests,when the first request is being handled and the others are waiting,i send cancellcation request,but all three requests call request.is_cancelled() return false in the first model,only return true in the second model,but i want to exit in the first model. I'm not sure if this is a feature or a bug

client code like this:

        async_request = client.async_infer(
            model_name="ensemble_model",
            inputs=inputs,
            outputs=outputs,
            callback=partial(callback, user_data),
        )
        time.sleep(2)
        print("cancel")
        async_request.cancel()
        print("sleep")
        time.sleep(12)

Triton Information triton 23.10 Ubuntu 22.04.3

Expected behavior after i send cancellcation, the request call is_cancelled() return true in the first model of ensemble model

eeeeeunjung avatar Dec 26 '23 03:12 eeeeeunjung

it seems each model has a request queue, when a request enters the queue waiting for serial processing, cancellation cannot cancel the current request status. The status will only be updated when the request enters the queue of the next model

eeeeeunjung avatar Dec 26 '23 07:12 eeeeeunjung

i make a python backend like this

def execute(self, requests):
    responses = []
    for request in requests:
        if request.is_cancelled():
            responses.append(
                pb_utils.InferenceResponse(
                    error=pb_utils.TritonError("Request Cancelled", pb_utils.TritonError.CANCELLED)
                )
            )
            continue
        request_type = pb_utils.get_input_tensor_by_name(request, "type").as_numpy().item().decode("utf-8")

        for i in range(20):
            print(f"request.is_cancelled: {request.is_cancelled()}")
            time.sleep(1)
        responses.append(
            pb_utils.InferenceResponse(
                [
                    pb_utils.Tensor("tmp_type", np.array([request_type]).astype(object)),

                ]
            )
        )
    return responses

i send a request to this ensembel_model's ptyhon backend,and send cancellcation after 5 seconds,but request.is_cancelled() always return False, seems is_cancelled only update between ensemble models

but when i send a request to this python backend without ensemble mode,request.is_cancelled() return True after i send cancellcation

eeeeeunjung avatar Dec 26 '23 09:12 eeeeeunjung

and when i use BLS, i can only cancel the outer request, i can not cancel the sub request sent in the bls's execute function

eeeeeunjung avatar Dec 27 '23 08:12 eeeeeunjung

@eeeeeunjung

and when i use BLS, i can only cancel the outer request, i can not cancel the sub request sent in the bls's execute function

Good point! I have filed a request for allowing for cancellation API in Python backend.

i send a request to this ensembel_model's ptyhon backend,and send cancellcation after 5 seconds,but request.is_cancelled() always return False, seems is_cancelled only update between ensemble models

@krishung5 / @kthui Do you know whether this is the expected the behaviour? I think cancellation should be propagated to the composing model requests as well.

Tabrizian avatar Jan 13 '24 00:01 Tabrizian

@kthui please correct me if I'm wrong. From the documentation regarding how Triton core handles cancelled requests, I think currently the check only happens between each step in ensemble models. It is possible that the requests are already forwarded to rate limiter stage, hence the cancellation is only updated on the next step between ensemble models.

krishung5 avatar Jan 17 '24 00:01 krishung5

Do you know whether this is the expected the behaviour? I think cancellation should be propagated to the composing model requests as well.

This is expected at this time, because the composing model will need know this request is from an ensemble parent model, which allows it to query the cancellation on the parent request. I don't think there is such relationship in the core at this time. I've filed an additional ticket for us to investigate this further.

kthui avatar Jan 17 '24 03:01 kthui