server inference failed: ensemble unexpected deadlock

I have an ensemble model, it works well but sometimes inference failed and reported like this:

in ensemble 'ensemble_qvd', unexpected deadlock, at least one output is not set while no more ensemble steps can be made

What is this error? How to solve this? Thanks !!

Apr 21 '22 12:04 nimeidi

name: "ensemble_qvd"
platform: "ensemble"
max_batch_size: 8
input [
  {
    name: "INPUT"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]
output [
  {
    name: "OUTPUT"
    data_type: TYPE_FP32
    dims: [ 8 ]
  }
]
ensemble_scheduling {
  step [
    {
      model_name: "preprocess"
      model_version: -1
      input_map {
        key: "INPUT"
        value: "INPUT"
      }
      output_map {
        key: "IMAGE"
        value: "preprocessed_image"
      }
    },
    {
      model_name: "blur"
      model_version: -1
      input_map {
        key: "INPUT__0"
        value: "preprocessed_image"
      }
      output_map {
        key: "OUTPUT__0"
        value: "blur_output0"
      }
    },
	{
      model_name: "occlusion"
      model_version: -1
      input_map {
        key: "INPUT__0"
        value: "preprocessed_image"
      }
      output_map {
        key: "OUTPUT__0"
        value: "occlusion_output0"
      }
    },
	{
      model_name: "canvas"
      model_version: -1
      input_map {
        key: "INPUT__0"
        value: "preprocessed_image"
      }
      output_map {
        key: "OUTPUT__0"
        value: "canvas_output0"
      }
    },
	{
      model_name: "wall"
      model_version: -1
      input_map {
        key: "INPUT__0"
        value: "preprocessed_image"
      }
      output_map {
        key: "OUTPUT__0"
        value: "wall_output0"
      }
    },
    {
      model_name: "postprocess"
      model_version: -1
      input_map {
        key: "INPUT0"
        value: "blur_output0"
      }
      input_map {
        key: "INPUT1"
        value: "occlusion_output0"
      }
      input_map {
        key: "INPUT2"
        value: "canvas_output0"
      }
      input_map {
        key: "INPUT3"
        value: "wall_output0"
      }
      output_map {
        key: "OUTPUT"
        value: "OUTPUT"
      }
    }
  ]
}

Apr 21 '22 12:04 nimeidi

@GuanLuo Do you happen to know why this might happen?

Apr 21 '22 23:04 Tabrizian

The error will be reported if one of the composing models doesn't generate the output tensors promised. Can you check if the composing models do returns all the outputs listed?

Apr 22 '22 00:04 GuanLuo

when it inference successfully, the composing models do returns all the outputs listed, but when failed, it returns nothing.Is there some configs to help deal this problem?

Apr 22 '22 06:04 nimeidi

Does "inference fail" refers to the ensemble inference fails? Do you know when does the inference fail? i.e. does it always fails on a specific input or it fails intermittently even the input is unchanged. If the former, you can send the request to each of the composing model to simulate the ensemble pipeline and identify which model is not producing output.

Triton expects the (composing) model to produce all outputs listed in its model config, and the model should fail and return error if it fails to do so.

Apr 22 '22 17:04 GuanLuo

it fails intermittently even the input is unchanged. Maybe the same input will be successful next time.

Apr 24 '22 06:04 nimeidi

I am also experiencing this issue as well...I have an ensemble (pre proc in python -> onnx model -> post proc in python) that is being fired from a BLS. After some debugging the point of failure in the ensemble is between either between the pre proc and the onnx model or between the onnx model and the post proc.

As when I check the output that the post proc is receiving, its receiving just an array of nans.

Looks like there is a bug somewhere that's causing data to be lost.

Aug 30 '22 18:08 avickars

@GuanLuo just wanted to check if there is any idea on the cause of this? I've noticed that it appears to only happen when it is under fairly significant load..but whether or not it happens seems to be a total toss up.

FYI, I'm experiencing this intermittently when sending 200 + requests (each with batch size of 1) in a very short time frame.

Edit: Appears to only occur when calling the ensemble from a BLS..call the ensemble by itself seems to be fine.

Aug 30 '22 18:08 avickars

I also encountered this issue for an ensemble model like the follows:

                 preprocess 
                   /   \
                  A     B
                   \   /
                postprocess

The error happens ~2% of the chances. It seems it is not due to heavy load, as sometimes I got the error when I manually trigger the requests a low speed.

Aug 31 '22 05:08 yxjiang

I tried remove either A or B and there was no such error. Seems there are some issues for model parallelism that causes the error.

Aug 31 '22 18:08 yxjiang

ya after some additional testing today, it doesn't appear to be due to load (seems basically random if/when it happens). For me the issue only seems to happen when I execute the ensemble from a BLS...but yep there definitely appears to be a bug.

Aug 31 '22 18:08 avickars

@yxjiang Is Python backend also used as part of the ensemble?

Aug 31 '22 18:08 GuanLuo

for me it is...the pre/post processes are python backends

Aug 31 '22 18:08 avickars

yes, python backend is used. It is also used in the case when I removed A or B, and there was no error in that case.

Aug 31 '22 23:08 yxjiang

Are A/B happened to be using Python backend? And can you also share the steps to reproduce the issue, you may share dummy models as long as the issue is reproducible if there is concern on sharing the actual model.

Aug 31 '22 23:08 GuanLuo

No, A/B are both tensorflow models. I cannot share the models as they are our models to be used in the production.

Not sure whether the complexity of models are one factor to cause this issue, otherwise we can just replace it with simple models like a+b.

There is no special steps for reproducibility. Once the inference service is up, we can send request to it either manually or with a script. It's hard to encounter errors via manual call, as the error rate is like 2%.

Sep 01 '22 03:09 yxjiang

I will need some help on reproducing the issue, I put together a simple ensemble with identity models to mimic the scenario and use perf_analyzer to generate loads, all perf_analyzer runs ended without error. Below is the model directory that I put together models.zip

perf_analyzer -m ensemble --concurrency-range 128:128:1 -p 20000 -v -a

Sep 03 '22 01:09 GuanLuo

@GuanLuo thanks for the update. Is there any way we can enforce the execution to be sequential? For example, enforce the execution order to be pre-process -> A -> B -> post-process even if A and B can be executed in parallel.

Sep 08 '22 00:09 yxjiang

There is no way to do so in ensemble, the dependency is deduced from the tensor connectivity in the ensemble config. You may construct your pipeline as BLS where you have more control over the workflow.

Sep 08 '22 01:09 GuanLuo

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue

Nov 22 '22 03:11 jbkyang-nvi

server server copied to clipboard

inference failed: ensemble unexpected deadlock

server
server copied to clipboard