espnet_onnx icon indicating copy to clipboard operation
espnet_onnx copied to clipboard

ONNX Model Fail to run

Open rajeevbaalwan opened this issue 1 year ago • 14 comments

Hi have exported the espnet model trained on my custom dataset using espnet_onnx. Model fails to work properly on some audios. Below is the error which i am getting

Non-zero status code returned while running Add node. Name:'/encoders/encoders.0/self_attn/Add' Status Message: /encoders/encoders.0/self_attn/Add: right operand cannot broadcast on dim 3 LeftShape: {1,8,171,171}, RightShape: {1,1,1,127}

Any idea what could be the issue here. I have infered model on 1500 audio clips and i am getting exactly same error on around 400 audio clips.

rajeevbaalwan avatar Oct 01 '23 12:10 rajeevbaalwan

Hi @rajeevbaalwan I would like to confirm some points:

  • Would you tell me which encoder you use in your model?
  • Did you observe any similarities between them?

Masao-Someki avatar Oct 05 '23 00:10 Masao-Someki

Hi @rajeevbaalwan I would like to confirm some points:

  • Would you tell me which encoder you use in your model?
  • Did you observe any similarities between them?

Thanks @Masao-Someki for your reply. I have used a simple transformer encoder. I didn't get your question regarding similarity. Do you want to know the similarity between error outputs or something else ?

rajeevbaalwan avatar Oct 05 '23 09:10 rajeevbaalwan

@Masao-Someki I have tried with conformer encoder based ASR model also but getting same error.

2023-10-08 23:12:29.048358681 [E:onnxruntime:, sequential_executor.cc:339 Execute] Non-zero status code returned while running Add node. Name:'/encoders/encoders.0/self_attn/Add_5' Status Message: /encoders/encoders.0/self_attn/Add_5: right operand cannot broadcast on dim 3 LeftShape: {1,8,187,187}, RightShape: {1,1,1,127}

rajeevbaalwan avatar Oct 08 '23 17:10 rajeevbaalwan

@rajeevbaalwan The node /encoders/encoders.0/self_attn/Add is the masking process. I think increasing the max_seq_len will fix this issue!

tag_name = 'your model'
m = ASRModelExport()

# Add the following export config
m.set_export_config(
    max_seq_len=5000,
)

m.export_from_pretrained(tag_name, quantize=False, optimize=False)

Masao-Someki avatar Oct 09 '23 03:10 Masao-Someki

In the masking process, your input audio seems to have a 171 frame length, while the mask has a 127 frame length. This difference causes this issue. The frame length is estimated during the onnx inference, but the maximum frame length is limited to the max_seq_len value. So increasing this value might fix this problem.

Masao-Someki avatar Oct 09 '23 03:10 Masao-Someki

@Masao-Someki Thanks it worked for me. But the exported ONNX models do not work with batch input, right ? It only works for a single audio clip.

rajeevbaalwan avatar Oct 09 '23 17:10 rajeevbaalwan

@rajeevbaalwan Yes, it does not work with batched input.

If you want to run batched inference, then you need to:

  1. Add the dynamic axes for batch dimension in the below script.
  2. Fix the inference function.

https://github.com/espnet/espnet_onnx/blob/7cd0f78ed56b1243005aca671a78e620883bb989/espnet_onnx/export/asr/models/encoders/transformer.py#L105-L106

Masao-Someki avatar Oct 10 '23 00:10 Masao-Someki

@Masao-Someki Thanks for the reply. I have already made the changes in the dynamic axes but this only won't solve the problem as the forward function only takes feats and not the actual length of the inputs in the batch, that's why enc_out_length is always wrong for the batch input as features length is calculated as below

feats_length = torch.ones(feats[:, :, 0].shape).sum(dim=-1).type(torch.long)

Is there any plan to handle batch inference during ONNX export in espnet_onnx? The complete inference function needs to be changed. If espnet_onnx is supposed to be implemented to prepare models for production then batch inferencing support is a must in the exported models. Single clip inference won't help in production.

rajeevbaalwan avatar Oct 10 '23 06:10 rajeevbaalwan

@rajeevbaalwan Sorry for the inconvenience, but currently we have no plan to support batch inference. We have investigated the speed up with batched inference in our paper by tring to apply onnx hubert for training, but onnx seems to be less effective with large batch size.

Masao-Someki avatar Oct 11 '23 21:10 Masao-Someki

@Masao-Someki You are absolutely right ONNX exported do not give huge speed up for large batch sizes but for small batch size like 4, 8, etc it is better than single clip inferencing. So it is better to have GPU-based implementation as it will be the generic implementation that will work for both single clip as well as multiple clips so that the user can have the flexibility. Event batch implementation doesn't degrade the performance for single clip inference. So can you take this feature into consideration?

rajeevbaalwan avatar Oct 12 '23 07:10 rajeevbaalwan

@Masao-Someki is ESPnetLanguageModel is support in ONNX?

rajeevbaalwan avatar Oct 17 '23 13:10 rajeevbaalwan

@rajeevbaalwan I assume that the user of this library is more like an individual who wants to execute the ESPnet model on a low-resource constraint, such as Raspi. If the inference with the onnx format does not provide enough speedup, then we don't need ESPnet-ONNX, we can just use GPU. Of course, I know having a multiple-batch inference option may be better, but I don't think it is worth implementing here.

is ESPnetLanguageModel is support in ONNX?

Yes, you can include an external language model.

Masao-Someki avatar Oct 17 '23 15:10 Masao-Someki

@rajeevbaalwan I assume that the user of this library is more like an individual who wants to execute the ESPnet model on a low-resource constraint, such as Raspi. If the inference with the onnx format does not provide enough speedup, then we don't need ESPnet-ONNX, we can just use GPU. Of course, I know having a multiple-batch inference option may be better, but I don't think it is worth implementing here.

is ESPnetLanguageModel is support in ONNX?

Yes, you can include an external language model.

@Masao-Someki I can't find the code to export the Language Model in ONNX in the repo.

rajeevbaalwan avatar Oct 17 '23 18:10 rajeevbaalwan

@rajeevbaalwan In the following line, ESPnet-onnx has export function for language models! https://github.com/espnet/espnet_onnx/blob/d617487a12e186f5240a74121f88af328fef2f02/espnet_onnx/export/asr/export_asr.py#L113-L126

Masao-Someki avatar Oct 24 '23 14:10 Masao-Someki