icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Variations in onnx export (input and inference issues): pruned_transducer_stateless7_streaming

Open vasistalodagala opened this issue 11 months ago • 3 comments

Hi,

I've a model using the pruned_transducer_stateless7_streaming recipe.

I've tried to export the model using the export-onnx.py and export.py with --onnx set to 1.

I've then used the onnx_pretrained.py file to test the exported model. While the export from export-onnx.py works fine with this test, the export from export.py fails. The issue was the missing meta_data from the export in export.py. I've followed the directives from export-onnx.py and modified export.py to include the export of meta_data.

However, even after exporting the model after including meta_data addition to export.py, I get the following error when trying to use the model with onnx_pretrained.py:

Traceback (most recent call last):
  File "./pruned_transducer_stateless7_streaming/onnx_pretrained.py", line 512, in <module>
    main()
  File "/eph/nvme0/azureml/cr/j/516411a3479842ed943f65f1a697581c/exe/wd/vasista/k2_env/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "./pruned_transducer_stateless7_streaming/onnx_pretrained.py", line 486, in main
    encoder_out = model.run_encoder(frames)
  File "./pruned_transducer_stateless7_streaming/onnx_pretrained.py", line 299, in run_encoder
    out = self.encoder.run(encoder_output_names, encoder_input)
  File "/eph/nvme0/azureml/cr/j/516411a3479842ed943f65f1a697581c/exe/wd/vasista/k2_env/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 216, in run
    self._validate_input(list(input_feed.keys()))
  File "/eph/nvme0/azureml/cr/j/516411a3479842ed943f65f1a697581c/exe/wd/vasista/k2_env/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 198, in _validate_input
    raise ValueError(
ValueError: Required inputs (['x_lens', 'len_cache', 'avg_cache', 'attn_cache', 'cnn_cache']) are missing from input feed (['x', 'cached_len_0', 'cached_len_1', 'cached_len_2', 'cached_len_3', 'cached_len_4', 'cached_avg_0', 'cached_avg_1', 'cached_avg_2', 'cached_avg_3', 'cached_avg_4', 'cached_key_0', 'cached_key_1', 'cached_key_2', 'cached_key_3', 'cached_key_4', 'cached_val_0', 'cached_val_1', 'cached_val_2', 'cached_val_3', 'cached_val_4', 'cached_val2_0', 'cached_val2_1', 'cached_val2_2', 'cached_val2_3', 'cached_val2_4', 'cached_conv1_0', 'cached_conv1_1', 'cached_conv1_2', 'cached_conv1_3', 'cached_conv1_4', 'cached_conv2_0', 'cached_conv2_1', 'cached_conv2_2', 'cached_conv2_3', 'cached_conv2_4']).

I've tried using the online-decode-files.py from sherpa-onnx on the model exported using export.py. That resulted in a Segmentation Fault error without any logs. The model exported using export-onnx.py worked fine with online-decode-files.py though.

Just to be sure, I've repeated the above steps available English model trained using the same architecture. The results are just the same as with the model I've trained.

Also, I've tried the export with the export.py file in the current commit and the one from release v1.1. They differ in terms of taking in the --tokens or --bpe_model at the time of exporting. However, both resulted in the same error log mentioned above.

Questions at this point:

  1. How do I run inference on the .onnx models exported with export.py? This is because our triton deployment uses the regular export.py
  2. Why are there two different exports to the same format (.onnx) and each of those models are requiring different inputs?

vasistalodagala avatar Dec 19 '24 09:12 vasistalodagala

Sorry for the confusion.

Models exported using export.py work with triton in k2-fsa/sherpa.

Models exported using export-onnx.py work with onnx_pretrained.py and with k2-fsa/sherpa-onnx.

csukuangfj avatar Dec 19 '24 10:12 csukuangfj

Is there some way (script) to test the models exported using export.py? While we are trying to deploy them on triton in k2-fsa/sherpa, we would also like to try some cli-inference using the same.

vasistalodagala avatar Dec 19 '24 11:12 vasistalodagala

I am afraid there is no script to test it.

Can you follow onnx_pretrained.py to write one?

csukuangfj avatar Dec 19 '24 11:12 csukuangfj