server
server copied to clipboard
inference failed: ensemble unexpected deadlock
I have an ensemble model, it works well but sometimes inference failed and reported like this:
in ensemble 'ensemble_qvd', unexpected deadlock, at least one output is not set while no more ensemble steps can be made
What is this error? How to solve this? Thanks !!
name: "ensemble_qvd"
platform: "ensemble"
max_batch_size: 8
input [
{
name: "INPUT"
data_type: TYPE_STRING
dims: [ 1 ]
}
]
output [
{
name: "OUTPUT"
data_type: TYPE_FP32
dims: [ 8 ]
}
]
ensemble_scheduling {
step [
{
model_name: "preprocess"
model_version: -1
input_map {
key: "INPUT"
value: "INPUT"
}
output_map {
key: "IMAGE"
value: "preprocessed_image"
}
},
{
model_name: "blur"
model_version: -1
input_map {
key: "INPUT__0"
value: "preprocessed_image"
}
output_map {
key: "OUTPUT__0"
value: "blur_output0"
}
},
{
model_name: "occlusion"
model_version: -1
input_map {
key: "INPUT__0"
value: "preprocessed_image"
}
output_map {
key: "OUTPUT__0"
value: "occlusion_output0"
}
},
{
model_name: "canvas"
model_version: -1
input_map {
key: "INPUT__0"
value: "preprocessed_image"
}
output_map {
key: "OUTPUT__0"
value: "canvas_output0"
}
},
{
model_name: "wall"
model_version: -1
input_map {
key: "INPUT__0"
value: "preprocessed_image"
}
output_map {
key: "OUTPUT__0"
value: "wall_output0"
}
},
{
model_name: "postprocess"
model_version: -1
input_map {
key: "INPUT0"
value: "blur_output0"
}
input_map {
key: "INPUT1"
value: "occlusion_output0"
}
input_map {
key: "INPUT2"
value: "canvas_output0"
}
input_map {
key: "INPUT3"
value: "wall_output0"
}
output_map {
key: "OUTPUT"
value: "OUTPUT"
}
}
]
}
@GuanLuo Do you happen to know why this might happen?
The error will be reported if one of the composing models doesn't generate the output tensors promised. Can you check if the composing models do returns all the outputs listed?
when it inference successfully, the composing models do returns all the outputs listed, but when failed, it returns nothing.Is there some configs to help deal this problem?
Does "inference fail" refers to the ensemble inference fails? Do you know when does the inference fail? i.e. does it always fails on a specific input or it fails intermittently even the input is unchanged. If the former, you can send the request to each of the composing model to simulate the ensemble pipeline and identify which model is not producing output.
Triton expects the (composing) model to produce all outputs listed in its model config, and the model should fail and return error if it fails to do so.
it fails intermittently even the input is unchanged. Maybe the same input will be successful next time.
I am also experiencing this issue as well...I have an ensemble (pre proc in python -> onnx model -> post proc in python) that is being fired from a BLS. After some debugging the point of failure in the ensemble is between either between the pre proc and the onnx model or between the onnx model and the post proc.
As when I check the output that the post proc is receiving, its receiving just an array of nans.
Looks like there is a bug somewhere that's causing data to be lost.
@GuanLuo just wanted to check if there is any idea on the cause of this? I've noticed that it appears to only happen when it is under fairly significant load..but whether or not it happens seems to be a total toss up.
FYI, I'm experiencing this intermittently when sending 200 + requests (each with batch size of 1) in a very short time frame.
Edit: Appears to only occur when calling the ensemble from a BLS..call the ensemble by itself seems to be fine.
I also encountered this issue for an ensemble model like the follows:
preprocess
/ \
A B
\ /
postprocess
The error happens ~2% of the chances. It seems it is not due to heavy load, as sometimes I got the error when I manually trigger the requests a low speed.
I tried remove either A or B and there was no such error. Seems there are some issues for model parallelism that causes the error.
ya after some additional testing today, it doesn't appear to be due to load (seems basically random if/when it happens). For me the issue only seems to happen when I execute the ensemble from a BLS...but yep there definitely appears to be a bug.
@yxjiang Is Python backend also used as part of the ensemble?
for me it is...the pre/post processes are python backends
yes, python backend is used. It is also used in the case when I removed A or B, and there was no error in that case.
Are A/B happened to be using Python backend? And can you also share the steps to reproduce the issue, you may share dummy models as long as the issue is reproducible if there is concern on sharing the actual model.
No, A/B are both tensorflow models. I cannot share the models as they are our models to be used in the production.
Not sure whether the complexity of models are one factor to cause this issue, otherwise we can just replace it with simple models like a+b.
There is no special steps for reproducibility. Once the inference service is up, we can send request to it either manually or with a script. It's hard to encounter errors via manual call, as the error rate is like 2%.
I will need some help on reproducing the issue, I put together a simple ensemble with identity models to mimic the scenario and use perf_analyzer to generate loads, all perf_analyzer runs ended without error. Below is the model directory that I put together models.zip
perf_analyzer -m ensemble --concurrency-range 128:128:1 -p 20000 -v -a
@GuanLuo thanks for the update. Is there any way we can enforce the execution to be sequential? For example, enforce the execution order to be pre-process -> A -> B -> post-process even if A and B can be executed in parallel.
There is no way to do so in ensemble, the dependency is deduced from the tensor connectivity in the ensemble config. You may construct your pipeline as BLS where you have more control over the workflow.
Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue