sherpa-onnx Canary 1b

Hello, is it possible to ise scripts for nemo model conversion to convert this model?

https://huggingface.co/nvidia/canary-1b

I'm asking because it supports put of the mox also translation and other tasks but I'm okay to have only it asr capabilities.

Thank you

Jan 25 '25 13:01 janjanusek

Can you test if it works? We have not tried to do that since it is very large and if you run it on CPU, it would be very slow.

Jan 25 '25 14:01 csukuangfj

unfortunatelly no,

it crashed when I tried to convert to onnx

--------------------------------------------------------------------------- 
AttributeError                            Traceback (most recent call last)
Cell In[9], line 14
      1 # convert_nemo_file('stt_enes_conformer_ctc_large_codesw') # big multi
      2 
      3 # convert_nemo_file('stt_de_conformer_ctc_large') # big
   (...)
     12 # convert_nemo_file('stt_fr_conformer_ctc_large') # big
     13 # convert_nemo_file('stt_fr_quartznet15x5') # jr only
---> 14 convert_nemo_file('canary-1b') # jr 

Cell In[7], line 18
     16 model_export_path = f'{model}/model.onnx'
     17 print(f'exporting model {model} to onnx')
---> 18 m.export(model_export_path)
     19 print(f"Converted file saved as: {model}")
     20 tokens_path = f'{model}/tokens.txt'

File ~/miniconda3/envs/nemo/lib/python3.10/site-packages/nemo/core/classes/exportable.py:117, in Exportable.export(self, output, input_example, verbose, do_constant_folding, onnx_opset_version, check_trace, dynamic_axes, check_tolerance, export_modules_as_functions, keep_initializers_as_inputs, use_dynamo)
    115 model = self.get_export_subnet(subnet_name)
    116 out_name = augment_filename(output, subnet_name)
--> 117 out, descr, out_example = model._export(
    118     out_name,
    119     input_example=input_example,
    120     verbose=verbose,
...
-> 1931 raise AttributeError(
   1932     f"'{type(self).__name__}' object has no attribute '{name}'"
   1933 )

AttributeError: 'EncDecMultiTaskModel' object has no attribute 'output_names'

Jan 25 '25 16:01 janjanusek

Any updates on this? This model looks extremely interesting, although the real time factor is not amazing.

@csukuangfj How about nvidia/parakeet-tdt-1.1b? This should run fast, on CPU as the real time factor is 10x higher than whisper turbo or whisper distil.

I believe it should be similar to convert as nvidia/parakeet-tdt_ctc-110m which you have already done.

Feb 08 '25 09:02 dnhkng

Any updates on this? This model looks extremely interesting, although the real time factor is not amazing.

@csukuangfj How about nvidia/parakeet-tdt-1.1b? This should run fast, on CPU as the real time factor is 10x higher than whisper turbo or whisper distil.

I believe it should be similar to convert as nvidia/parakeet-tdt_ctc-110m which you have already done.

Hi @dnhkng , I am able to convert the 110m model to onnx and do the inference. But when I repeat the same steps for the 1.1b model, I get a a bunch of files and the onnx file as shown below.

encoder.layers.39.conv.pointwise_conv1.weight  onnx__MatMul_159314                                   onnx__MatMul_166059  onnx__MatMul_172804
encoder.layers.39.conv.pointwise_conv2.weight  onnx__MatMul_159439                                   onnx__MatMul_166184  onnx__MatMul_172929
encoder.layers.39.feed_forward1.linear1.bias   onnx__MatMul_159648                                   onnx__MatMul_166393  onnx__MatMul_173138
encoder.layers.39.feed_forward2.linear1.bias   onnx__MatMul_159654                                   onnx__MatMul_166399  onnx__MatMul_173144
encoder.layers.3.conv.pointwise_conv1.bias     onnx__MatMul_159655                                   onnx__MatMul_166400  onnx__MatMul_173145
encoder.layers.3.conv.pointwise_conv1.weight   onnx__MatMul_159656                                   onnx__MatMul_166401  parakeet-tdt_ctc_1_1b.onnx
encoder.layers.3.conv.pointwise_conv2.weight   onnx__MatMul_159657                                   onnx__MatMul_166402
encoder.layers.3.feed_forward1.linear1.bias    onnx__MatMul_159658                                   onnx__MatMul_166403

Now when I do in inference in onnrxruntime, I get the following error as shown below

2025-04-28 17:25:06.6341562 [E:onnxruntime:, inference_session.cc:2197 onnxruntime::InferenceSession::Initialize::<lambda_fdc1a8126446f85df1daee0589f02440>::operator ()] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\graph\graph_utils.cc:28 onnxruntime::graph_utils::GetIndexFromName itr != node_args.end() was false. Attempting to get index by a name which does not exist:/layers.0/self_attn/Concat_80_output_0for node: /layers.0/self_attn/Reshape_64_new_reshape

Could you please let me know where I am going wrong 😄. I am following the same steps I did for 110m.

Apr 28 '25 12:04 AbhijithMallya

Could you please let me know where I am going wrong

The model is toooooo large that is over 2GB.

Please try the int8.onnx model.

Apr 28 '25 12:04 csukuangfj

Could you please let me know where I am going wrong

The model is toooooo large that is over 2GB.

Please try the int8.onnx model.

Hi @csukuangfj , As suggested, I tried to quantize the onnx model to int8.

import onnx
from onnxruntime.quantization import QuantType, quantize_dynamic

onnx_model = onnx.load("parakeet-tdt-ctc-1.1b.onnx")
quantize_dynamic(
    model_input="parakeet-tdt-ctc-1.1b.onnx",
    model_output="parakeet-tdt-ctc-1.1b_int_8.onnx",
    per_channel=True,
    weight_type=QuantType.QUInt8,
)

onnx.load executes without error.

The process crashes during quantize_dynamic, due to Memory spike. I am using Google Collab

Let me know if this is expected and what is the Memory I should have to convert the onnx model

Apr 28 '25 17:04 AbhijithMallya

Could you please let me know where I am going wrong

The model is toooooo large that is over 2GB.

Please try the int8.onnx model.

Hi @csukuangfj ,

I tried to quantize the 1.1b model in a kaggle notebook with a better RAM. I was able to quantize the model to int_8 onnx model.

onnx_model = onnx.load("parakeet-tdt-ctc-1.1b.onnx")
quantize_dynamic(
    model_input="parakeet-tdt-ctc-1.1b.onnx",
    model_output="parakeet-tdt-ctc-1.1b_int_8.onnx",
    per_channel=True,
    weight_type=QuantType.QUInt8,
)

I got a parakeet-tdt-ctc-1.1b_int_8.onnx (1.15Gb ) . Original parakeet-tdt-ctc-1.1b.nemo was 4.19Gb

However , I ran into error during the onnxruntime inference.

import onnxruntime as ort

ort_session = ort.InferenceSession("parakeet-tdt-ctc-1.1b_int_8.onnx")

I am getting the following error:

---------------------------------------------------------------------------
RuntimeException                          Traceback (most recent call last)
/tmp/ipykernel_31/4261674062.py in <cell line: 0>()
----> 1 ort_session = ort.InferenceSession("parakeet-tdt-ctc-1.1b_int_8.onnx")

/usr/local/lib/python3.11/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options, **kwargs)
    467 
    468         try:
--> 469             self._create_inference_session(providers, provider_options, disabled_optimizers)
    470         except (ValueError, RuntimeError) as e:
    471             if self._enable_fallback:

/usr/local/lib/python3.11/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in _create_inference_session(self, providers, provider_options, disabled_optimizers)
    539 
    540         # initialize the C++ InferenceSession
--> 541         sess.initialize_session(providers, provider_options, disabled_optimizers)
    542 
    543         self._sess = sess

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/graph/graph_utils.cc:27 int onnxruntime::graph_utils::GetIndexFromName(const onnxruntime::Node&, const std::string&, bool) itr != node_args.end() was false. Attempting to get index by a name which does not exist:/layers.0/self_attn/Concat_80_output_0for node: /layers.0/self_attn/Reshape_64_new_reshape

Apr 28 '25 19:04 AbhijithMallya

can you check the version of onnx.you use to export the model.and the version of onnxruntime to run it?

Apr 28 '25 23:04 csukuangfj

can you check the version of onnx.you use to export the model.and the version of onnxruntime to run it?

Hi @csukuangfj

Apr 29 '25 04:04 AbhijithMallya

please try onnxruntime 1.17.1

Apr 29 '25 06:04 csukuangfj

@csukuangfj Thanks a lot. I am able to do the inference smoothly.

Apr 29 '25 11:04 AbhijithMallya

Great to hear you make it!

Apr 29 '25 11:04 csukuangfj

sherpa-onnx sherpa-onnx copied to clipboard

Canary 1b

sherpa-onnx
sherpa-onnx copied to clipboard