server
server copied to clipboard
why can not cancel the request in the first model of ensemble_model
Description i have a ensemble model, and two sub model with python backend, i add code like this in my two sub model, the models are one instance,and max batch size is 1,
def execute(self, requests=None):
responses = []
for request in requests:
if request.is_cancelled():
responses.append(
pb_utils.InferenceResponse(
error=pb_utils.TritonError("Request Cancelled", pb_utils.TritonError.CANCELLED)
)
)
continue
and i use python grpc to send three requests,when the first request is being handled and the others are waiting,i send cancellcation request,but all three requests call request.is_cancelled() return false in the first model,only return true in the second model,but i want to exit in the first model. I'm not sure if this is a feature or a bug
client code like this:
async_request = client.async_infer(
model_name="ensemble_model",
inputs=inputs,
outputs=outputs,
callback=partial(callback, user_data),
)
time.sleep(2)
print("cancel")
async_request.cancel()
print("sleep")
time.sleep(12)
Triton Information triton 23.10 Ubuntu 22.04.3
Expected behavior after i send cancellcation, the request call is_cancelled() return true in the first model of ensemble model
it seems each model has a request queue, when a request enters the queue waiting for serial processing, cancellation cannot cancel the current request status. The status will only be updated when the request enters the queue of the next model
i make a python backend like this
def execute(self, requests):
responses = []
for request in requests:
if request.is_cancelled():
responses.append(
pb_utils.InferenceResponse(
error=pb_utils.TritonError("Request Cancelled", pb_utils.TritonError.CANCELLED)
)
)
continue
request_type = pb_utils.get_input_tensor_by_name(request, "type").as_numpy().item().decode("utf-8")
for i in range(20):
print(f"request.is_cancelled: {request.is_cancelled()}")
time.sleep(1)
responses.append(
pb_utils.InferenceResponse(
[
pb_utils.Tensor("tmp_type", np.array([request_type]).astype(object)),
]
)
)
return responses
i send a request to this ensembel_model's ptyhon backend,and send cancellcation after 5 seconds,but request.is_cancelled() always return False, seems is_cancelled only update between ensemble models
but when i send a request to this python backend without ensemble mode,request.is_cancelled() return True after i send cancellcation
and when i use BLS, i can only cancel the outer request, i can not cancel the sub request sent in the bls's execute function
@eeeeeunjung
and when i use BLS, i can only cancel the outer request, i can not cancel the sub request sent in the bls's execute function
Good point! I have filed a request for allowing for cancellation API in Python backend.
i send a request to this ensembel_model's ptyhon backend,and send cancellcation after 5 seconds,but request.is_cancelled() always return False, seems is_cancelled only update between ensemble models
@krishung5 / @kthui Do you know whether this is the expected the behaviour? I think cancellation should be propagated to the composing model requests as well.
@kthui please correct me if I'm wrong. From the documentation regarding how Triton core handles cancelled requests, I think currently the check only happens between each step in ensemble models. It is possible that the requests are already forwarded to rate limiter stage, hence the cancellation is only updated on the next step between ensemble models.
Do you know whether this is the expected the behaviour? I think cancellation should be propagated to the composing model requests as well.
This is expected at this time, because the composing model will need know this request is from an ensemble parent model, which allows it to query the cancellation on the parent request. I don't think there is such relationship in the core at this time. I've filed an additional ticket for us to investigate this further.