sherpa-onnx 有办法将KWS模型转成RKNN么

编写flutter安卓在3566的板子上运行，需要利用板子的NPU，我对这块不是很了解但是文档都翻了一下尝试过自己转RKNN，各种报错无法转出。 https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2 这个是从仓库里面找到的KWS的onxx的模型，安卓环境so文件都配置好了，我已经卡了好几天了实在没办法求助一下

Oct 15 '25 07:10 DaYang816

Hi @DaYang816! I'm Dosu and I’m helping the sherpa-onnx team.

你的KWS模型（Zipformer架构的ONNX模型）是可以转成RKNN格式并在RK3566板子的NPU上运行的。官方文档有专门针对Zipformer模型转RKNN的详细中文教程，包括转换步骤和在RK3566上部署运行的方法，推荐你参考这个教程：https://doc.embedfire.com/linux/rk356x/Ai/zh/latest/lubancat_ai/example/zipformer.html
（如果打不开，可以从官方教程入口找到）

常见的转换和运行问题，比如 Invalid RKNN model version 6，通常是因为板子上的 librknnrt.so 版本太旧。可以从 Rockchip官方RKNN toolkit仓库下载最新版替换板子上的库文件，能解决大部分兼容性问题。参考

转换流程大致是：

用RKNN Toolkit把ONNX模型转成RKNN格式（注意Toolkit版本和模型算子兼容性）。
把转换好的RKNN模型和最新版的librknnrt.so部署到板子上。
用sherpa-onnx的rknn provider加载模型（命令行参数 --provider=rknn），即可在NPU上推理。

如果你遇到具体报错，可以把错误信息贴出来，这样能更有针对性地帮你排查。
Flutter安卓环境只要JNI的so文件和RKNN模型都准备好，按官方流程集成即可。参考部署文档

总之，官方已经验证过Zipformer模型在RK3566上转RKNN并运行是可行的，关键是按教程操作并确保环境和库文件版本正确。

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Oct 15 '25 07:10 dosubot[bot]

我们可以帮助你解决这个问题.

你不能用我们提供好的 onnx 模型。是的，不能用！
你要找到这个模型的原始 pytorch checkpoint
找到这个模型的训练参数
参考 https://github.com/k2-fsa/icefall/blob/master/.github/scripts/librispeech/ASR/run_rknn.sh#L224
注意 4 中是参考，不是 copy & paste. 你先要看懂它，然后换成 2 里的 checkpoint 和 3 中的参数
第4步后，你得到3个 onnx 模型文件，然后参考 https://github.com/k2-fsa/icefall/blob/master/.github/scripts/librispeech/ASR/run_rknn.sh#L245
第6步后，你得到3个 rknn 模型文件，你可以代替 sherpa-onnx 文档中这个模型对应的 onnx 文件，然后运行时，指定 provider 为 "rknn" 就可以了。

Oct 15 '25 07:10 csukuangfj

环境配置脚本为 https://github.com/k2-fsa/icefall/blob/master/.github/workflows/rknn.yml

推荐 rknntoolkit 2.2.0

极力推荐 rknntoolkit 2.2.0。如果你不听，很有可能后面浪费大量时间。

Oct 15 '25 07:10 csukuangfj

谢谢回复，我再尝试一下，研究一下文档

Oct 15 '25 07:10 DaYang816

如果你愿意等一段时间，可以等我们提供 kws rknn 模型.

Oct 15 '25 07:10 csukuangfj

目前尝试利用你们提供的现有模型过转换RKNN，失败了，在报错E load_onnx: The input shape ['N', 45, 80] of 'x' is not support! 看你的建议是需要我自己去训练出一个模型，然后转换得到rknn，对于ai模型，训练这块知识可能超出我的能力我还需要长时间的学习。。 kws rknn模型有预计发布时间么

Oct 15 '25 07:10 DaYang816

https://github.com/k2-fsa/sherpa-onnx/issues/2690#issuecomment-3405013452

这里已经非常明确的告诉你如何做了。

请问，你有去做么？

如果有，请贴你每一步的截图。如果没有，可否告知我们原因？

Oct 15 '25 07:10 csukuangfj

谢谢回复自己已经打算尝试了，也跟团队负责ai的说了，准备让专门负责ai小伙伴根据你的建议进行下去那个报错是我个人疑惑，如果是建议1的答案，ok没啥问题你的建议步骤是我的知识盲区，刚接触我还需要再多学习

Oct 15 '25 08:10 DaYang816

如果有出现问题还请指教，谢谢

Oct 15 '25 08:10 DaYang816

那个报错是我个人疑惑

rknn 不支持动态 shape. 我们提供好的 onnx 模型文件，里面的 batch size 属于动态维度。这就是为什么上面的第一步，请你不要用我们提供好的 onnx 文件。（这个属于你的知识盲区了)

Oct 15 '25 08:10 csukuangfj

好的十分感谢同事在问你们 kws rknn模型有预计发布时间么

Oct 15 '25 08:10 DaYang816

计划赶不上变化。只能说尽快

Oct 15 '25 08:10 csukuangfj

好的谢谢，辛苦辛苦

Oct 15 '25 08:10 DaYang816

大佬大佬，目前我这边碰到一个问题，我这边已经得到了RNKK模型，目前只能flutter打包apk才能在板子上使用，可以识别关键字，但是无法在调试的时候进行，点击录音开始初始化程序会卡死无法动弹，主要看输出没看出什么异常，能帮忙看看么

W/sherpa-onnx(24031): KeywordSpotterConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="/data/user/0/com.zxj.demo/files/encoder.rknn", decoder="/data/user/0/com.zxj.demo/files/decoder.rknn", joiner="/data/user/0/com.zxj.demo/files/joiner.rknn"), paraformer=OnlineParaformerModelConfig(encoder="", decoder=""), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), nemo_ctc=OnlineNeMoCtcModelConfig(model=""), t_one_ctc=OnlineToneCtcModelConfig(model=""), provider_config=ProviderConfig(device=0, provider="rknn", cuda_config=CudaConfig(cudnn_conv_algo_search=1), trt_config=TensorrtConfig(trt_max_workspace_size=2147483647, trt_max_partition_iterations=10, trt_min_subgraph_size=5, trt_fp16_enable="True", trt_detailed_build_log="False", trt_engine_cache_enable="True", trt
W/sherpa-onnx(24031): sdk api version: 2.3.2 (429f97ae6b@2025-04-09T09:08:16), driver version: 0.7.2
W/sherpa-onnx(24031): model: 39 inputs, 39 outputs
W/sherpa-onnx(24031): ----------Model inputs info----------
W/sherpa-onnx(24031): {0, name: x, shape: (1,45,80), n_elems: 3600, size: 7200, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {1, name: cached_key_0, shape: (128,1,128), n_elems: 16384, size: 32768, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {2, name: cached_nonlin_attn_0, shape: (1,128,96,1), n_elems: 12288, size: 24576, fmt: NHWC, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {3, name: cached_val1_0, shape: (128,1,48), n_elems: 6144, size: 12288, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {4, name: cached_val2_0, shape: (128,1,48), n_elems: 6144, size: 12288, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {5, name: cached_conv1_0, shape: (1,128,15), n_elems: 1920, size: 3840, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {6, name: cached_conv2_0, shape: (1,128,15), n_elems: 1920, size: 3840, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {7, name: cached_key_1, shape: (64,1,128), n_elems: 8192, size: 16384, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {8, name: cached_nonlin_a
W/sherpa-onnx(24031): ----------Model outputs info----------
W/sherpa-onnx(24031): {0, name: encoder_out, shape: (1,8,320), n_elems: 2560, size: 5120, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {1, name: new_cached_key_0, shape: (128,1,128), n_elems: 16384, size: 32768, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {2, name: new_cached_nonlin_attn_0, shape: (1,1,128,96), n_elems: 12288, size: 24576, fmt: NCHW, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {3, name: new_cached_val1_0, shape: (128,1,48), n_elems: 6144, size: 12288, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {4, name: new_cached_val2_0, shape: (128,1,48), n_elems: 6144, size: 12288, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {5, name: new_cached_conv1_0, shape: (1,128,15), n_elems: 1920, size: 3840, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {6, name: new_cached_conv2_0, shape: (1,128,15), n_elems: 1920, size: 3840, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {7, name: new_cached_key_1, shape: (64,1,128), n_elems: 8192, size: 16384, fmt: UNDEFINED, type: FP16, pass_th
W/sherpa-onnx(24031): customs string: model_type=zipformer2;decode_chunk_len=32;T=45;num_encoder_layers=1,1,1,1,1,1;encoder_dims=128,128,128,128,128,128;cnn_module_kernels=31,31,15,15,15,31;left_context_len=128,64,32,16,32,64;query_head_dims=32,32,32,32,32,32;value_head_dims=12,12,12,12,12,12;num_heads=4,4,4,8,4,4
W/sherpa-onnx(24031): num_heads: 4,4,4,8,4,4
W/sherpa-onnx(24031): encoder_dims: 128,128,128,128,128,128
W/sherpa-onnx(24031): value_head_dims: 12,12,12,12,12,12
W/sherpa-onnx(24031): cnn_module_kernels: 31,31,15,15,15,31
W/sherpa-onnx(24031): T: 45
W/sherpa-onnx(24031): left_context_len: 128,64,32,16,32,64
W/sherpa-onnx(24031): model_type: zipformer2
W/sherpa-onnx(24031): decode_chunk_len: 32
W/sherpa-onnx(24031): query_head_dims: 32,32,32,32,32,32
W/sherpa-onnx(24031): num_encoder_layers: 1,1,1,1,1,1
W/sherpa-onnx(24031): encoder_dims: 128 128 128 128 128 128 
W/sherpa-onnx(24031): attention_dims: 
W/sherpa-onnx(24031): num_encoder_layers: 1 1 1 1 1 1 
W/sherpa-onnx(24031): cnn_module_kernels: 31 31 15 15 15 31 
W/sherpa-onnx(24031): left_context_len: 128 64 32 16 32 64 
W/sherpa-onnx(24031): T: 45
W/sherpa-onnx(24031): decode_chunk_len_: 32
W/sherpa-onnx(24031): sdk api version: 2.3.2 (429f97ae6b@2025-04-09T09:08:16), driver version: 0.7.2
W/sherpa-onnx(24031): model: 1 inputs, 1 outputs
W/sherpa-onnx(24031): ----------Model inputs info----------
W/sherpa-onnx(24031): {0, name: y, shape: (1,2), n_elems: 2, size: 16, fmt: UNDEFINED, type: INT64, pass_through: false}
W/sherpa-onnx(24031): ----------Model outputs info----------
W/sherpa-onnx(24031): {0, name: decoder_out, shape: (1,320), n_elems: 320, size: 640, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): context_size: 2
W/sherpa-onnx(24031): sdk api version: 2.3.2 (429f97ae6b@2025-04-09T09:08:16), driver version: 0.7.2
W/sherpa-onnx(24031): model: 2 inputs, 1 outputs
W/sherpa-onnx(24031): ----------Model inputs info----------
W/sherpa-onnx(24031): {0, name: encoder_out, shape: (1,320), n_elems: 320, size: 640, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): {1, name: decoder_out, shape: (1,320), n_elems: 320, size: 640, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): ----------Model outputs info----------
W/sherpa-onnx(24031): {0, name: logit, shape: (1,197), n_elems: 197, size: 394, fmt: UNDEFINED, type: FP16, pass_through: false}
W/sherpa-onnx(24031): vocab_size: 197
I/flutter (24031): [      id: 5
I/flutter (24031):       label: rk3566_r (HDMI, )
I/flutter (24031):       ,       id: 6
I/flutter (24031):       label: rk3566_r (built-in microphone, bottom)
I/flutter (24031):       ]
W/sherpa-onnx(24031): Failed to select npu core to run the model (You can ignore it if you are not using RK3588.
I/com.zxj.demo(24031): Thread[6,tid=24044,WaitingInMainSignalCatcherLoop,Thread*=0xb40000713483c6f0,peer=0x130024a8,"Signal Catcher"]: reacting to signal 3
I/com.zxj.demo(24031): 
I/com.zxj.demo(24031): Wrote stack traces to tombstoned

Oct 20 '25 03:10 DaYang816

建议你从文件识别去调试

Oct 20 '25 03:10 csukuangfj

好的谢谢

Oct 20 '25 03:10 DaYang816

@csukuangfj 原始的pt文件连接能发一下吗，我们想自己转一下

Oct 28 '25 09:10 Soulfloret

所有的，都是开源的，建议你去 icefall 找。@Soulfloret

Oct 28 '25 10:10 csukuangfj

sherpa-onnx sherpa-onnx copied to clipboard

有办法将KWS模型转成RKNN么

sherpa-onnx
sherpa-onnx copied to clipboard