sherpa-onnx
sherpa-onnx copied to clipboard
Canary 1b
Hello, is it possible to ise scripts for nemo model conversion to convert this model?
https://huggingface.co/nvidia/canary-1b
I'm asking because it supports put of the mox also translation and other tasks but I'm okay to have only it asr capabilities.
Thank you
Can you test if it works? We have not tried to do that since it is very large and if you run it on CPU, it would be very slow.
unfortunatelly no,
it crashed when I tried to convert to onnx
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[9], line 14
1 # convert_nemo_file('stt_enes_conformer_ctc_large_codesw') # big multi
2
3 # convert_nemo_file('stt_de_conformer_ctc_large') # big
(...)
12 # convert_nemo_file('stt_fr_conformer_ctc_large') # big
13 # convert_nemo_file('stt_fr_quartznet15x5') # jr only
---> 14 convert_nemo_file('canary-1b') # jr
Cell In[7], line 18
16 model_export_path = f'{model}/model.onnx'
17 print(f'exporting model {model} to onnx')
---> 18 m.export(model_export_path)
19 print(f"Converted file saved as: {model}")
20 tokens_path = f'{model}/tokens.txt'
File ~/miniconda3/envs/nemo/lib/python3.10/site-packages/nemo/core/classes/exportable.py:117, in Exportable.export(self, output, input_example, verbose, do_constant_folding, onnx_opset_version, check_trace, dynamic_axes, check_tolerance, export_modules_as_functions, keep_initializers_as_inputs, use_dynamo)
115 model = self.get_export_subnet(subnet_name)
116 out_name = augment_filename(output, subnet_name)
--> 117 out, descr, out_example = model._export(
118 out_name,
119 input_example=input_example,
120 verbose=verbose,
...
-> 1931 raise AttributeError(
1932 f"'{type(self).__name__}' object has no attribute '{name}'"
1933 )
AttributeError: 'EncDecMultiTaskModel' object has no attribute 'output_names'
Any updates on this? This model looks extremely interesting, although the real time factor is not amazing.
@csukuangfj How about nvidia/parakeet-tdt-1.1b? This should run fast, on CPU as the real time factor is 10x higher than whisper turbo or whisper distil.
I believe it should be similar to convert as nvidia/parakeet-tdt_ctc-110m which you have already done.
Any updates on this? This model looks extremely interesting, although the real time factor is not amazing.
@csukuangfj How about nvidia/parakeet-tdt-1.1b? This should run fast, on CPU as the real time factor is 10x higher than whisper turbo or whisper distil.
I believe it should be similar to convert as nvidia/parakeet-tdt_ctc-110m which you have already done.
Hi @dnhkng , I am able to convert the 110m model to onnx and do the inference. But when I repeat the same steps for the 1.1b model, I get a a bunch of files and the onnx file as shown below.
encoder.layers.39.conv.pointwise_conv1.weight onnx__MatMul_159314 onnx__MatMul_166059 onnx__MatMul_172804
encoder.layers.39.conv.pointwise_conv2.weight onnx__MatMul_159439 onnx__MatMul_166184 onnx__MatMul_172929
encoder.layers.39.feed_forward1.linear1.bias onnx__MatMul_159648 onnx__MatMul_166393 onnx__MatMul_173138
encoder.layers.39.feed_forward2.linear1.bias onnx__MatMul_159654 onnx__MatMul_166399 onnx__MatMul_173144
encoder.layers.3.conv.pointwise_conv1.bias onnx__MatMul_159655 onnx__MatMul_166400 onnx__MatMul_173145
encoder.layers.3.conv.pointwise_conv1.weight onnx__MatMul_159656 onnx__MatMul_166401 parakeet-tdt_ctc_1_1b.onnx
encoder.layers.3.conv.pointwise_conv2.weight onnx__MatMul_159657 onnx__MatMul_166402
encoder.layers.3.feed_forward1.linear1.bias onnx__MatMul_159658 onnx__MatMul_166403
Now when I do in inference in onnrxruntime, I get the following error as shown below
2025-04-28 17:25:06.6341562 [E:onnxruntime:, inference_session.cc:2197 onnxruntime::InferenceSession::Initialize::<lambda_fdc1a8126446f85df1daee0589f02440>::operator ()] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\graph\graph_utils.cc:28 onnxruntime::graph_utils::GetIndexFromName itr != node_args.end() was false. Attempting to get index by a name which does not exist:/layers.0/self_attn/Concat_80_output_0for node: /layers.0/self_attn/Reshape_64_new_reshape
Could you please let me know where I am going wrong 😄. I am following the same steps I did for 110m.
Could you please let me know where I am going wrong
The model is toooooo large that is over 2GB.
Please try the int8.onnx model.
Could you please let me know where I am going wrong
The model is toooooo large that is over 2GB.
Please try the int8.onnx model.
Hi @csukuangfj , As suggested, I tried to quantize the onnx model to int8.
import onnx
from onnxruntime.quantization import QuantType, quantize_dynamic
onnx_model = onnx.load("parakeet-tdt-ctc-1.1b.onnx")
quantize_dynamic(
model_input="parakeet-tdt-ctc-1.1b.onnx",
model_output="parakeet-tdt-ctc-1.1b_int_8.onnx",
per_channel=True,
weight_type=QuantType.QUInt8,
)
onnx.load executes without error.
The process crashes during quantize_dynamic, due to Memory spike.
I am using Google Collab
Let me know if this is expected and what is the Memory I should have to convert the onnx model
Could you please let me know where I am going wrong
The model is toooooo large that is over 2GB.
Please try the int8.onnx model.
Hi @csukuangfj ,
I tried to quantize the 1.1b model in a kaggle notebook with a better RAM. I was able to quantize the model to int_8 onnx model.
onnx_model = onnx.load("parakeet-tdt-ctc-1.1b.onnx")
quantize_dynamic(
model_input="parakeet-tdt-ctc-1.1b.onnx",
model_output="parakeet-tdt-ctc-1.1b_int_8.onnx",
per_channel=True,
weight_type=QuantType.QUInt8,
)
I got a parakeet-tdt-ctc-1.1b_int_8.onnx (1.15Gb ) . Original parakeet-tdt-ctc-1.1b.nemo was 4.19Gb
However , I ran into error during the onnxruntime inference.
import onnxruntime as ort
ort_session = ort.InferenceSession("parakeet-tdt-ctc-1.1b_int_8.onnx")
I am getting the following error:
---------------------------------------------------------------------------
RuntimeException Traceback (most recent call last)
/tmp/ipykernel_31/4261674062.py in <cell line: 0>()
----> 1 ort_session = ort.InferenceSession("parakeet-tdt-ctc-1.1b_int_8.onnx")
/usr/local/lib/python3.11/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options, **kwargs)
467
468 try:
--> 469 self._create_inference_session(providers, provider_options, disabled_optimizers)
470 except (ValueError, RuntimeError) as e:
471 if self._enable_fallback:
/usr/local/lib/python3.11/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in _create_inference_session(self, providers, provider_options, disabled_optimizers)
539
540 # initialize the C++ InferenceSession
--> 541 sess.initialize_session(providers, provider_options, disabled_optimizers)
542
543 self._sess = sess
RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/graph/graph_utils.cc:27 int onnxruntime::graph_utils::GetIndexFromName(const onnxruntime::Node&, const std::string&, bool) itr != node_args.end() was false. Attempting to get index by a name which does not exist:/layers.0/self_attn/Concat_80_output_0for node: /layers.0/self_attn/Reshape_64_new_reshape
can you check the version of onnx.you use to export the model.and the version of onnxruntime to run it?
can you check the version of onnx.you use to export the model.and the version of onnxruntime to run it?
Hi @csukuangfj
please try onnxruntime 1.17.1
@csukuangfj Thanks a lot. I am able to do the inference smoothly.
Great to hear you make it!