sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android...

Results 419 sherpa-onnx issues
Sort by recently updated
recently updated
newest added

在Kotlin中使用双语模型vits-melo-tts-zh_en.tar.bz2 ,虽然加载该模型没有报错,但是读出来的声音完全不对,非常含糊不清。 主要的配置如下 ~~~kotlin modelDir = "vits-melo-tts-zh_en" modelName = "model.onnx" lexicon = "lexicon.txt" dictDir = "$modelDir/dict" ruleFsts = "$modelDir/new_heteronym.fst,$modelDir/number.fst,$modelDir/phone.fst,$modelDir/date.fst" dataDir = "" .... if (dictDir != null) { val newDir...

![Image](https://github.com/user-attachments/assets/77a24ebc-4b77-4f27-81e1-9149736de5c6) 主要代码: const modelDir = 'assets/sherpa-onnx-streaming-zipformer-es-kroko-2025-08-06'; return sherpa_onnx.OnlineModelConfig( transducer: sherpa_onnx.OnlineTransducerModelConfig( encoder: await copyAssetFile( '$modelDir/encoder.onnx'), decoder: await copyAssetFile( '$modelDir/decoder.onnx'), joiner: await copyAssetFile( '$modelDir/joiner.onnx'), ), tokens: await copyAssetFile('$modelDir/tokens.txt'), modelType: 'zipformer2', ); final...

下载了sherpa-onnx-rk3588-20-seconds-paraformer-zh-2025-10-07.tar.bz2模型,但没有相关使用文档,不知道如何使用命令运行。请教如何在linux环境运行该模型。

parakeet使用cuda,返回结果是空,或是闪退,使用cpu正常; 作为对比,测试sense-voice,fireredasr和whisper可以正常使用cuda加速, 如果非parakeet模型正常,parakeet不正常,说明cuda环境是正常的。 希望开发者可以测试,验证下结果, 另外silero-vad,我这边测试也是不支持GPU;

./sherpa-onnx-offline-tts --matcha-acoustic-model=./matcha-icefall-zh-baker/model-steps-6.onnx --matcha-vocoder=./vocos-16khz-univ.onnx --matcha-tokens=./matcha-icefall-zh-baker/tokens.txt --matcha-lexicon=./matcha-icefall-zh-baker/lexicon.txt --matcha-dict-dir=./matcha-icefall-zh-baker/dict --tts-rule-fsts=./matcha-icefall-zh-baker/phone.fst,./matcha-icefall-zh-baker/date.fst,./matcha-icefall-zh-baker/number.fst --debug=1 --num-threads=4 --matcha-length-scale=1 --output-filename=./newgenerated.wav "当夜幕 降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔.某某银行的副行长和 一些行政领导表示,他们去过长江和长白山; 经济不断增长。2024年12月31号,拨打110或者18920240511。123456块钱。" 替换tokens.txt和lexicon.txt后,执行以上命令生成的newgenerated.wav音质上不如https://www.tulingyun.com/tts.html这里生成的,好像是“着、的”等助词的生成有问题,请问是什么原因? [tulingyun.mp3](https://github.com/user-attachments/files/23018859/tulingyun.mp3) [sherpa-onnx-matcha.wav](https://github.com/user-attachments/files/23018862/sherpa-onnx-matcha.wav)

Hello, I was able to find on nuget the Linux and Windows binaries for sherpa, but not the ones for OSX. Are there any known legitimate mirrors that host them?...

Why does This Libs Don't Support Emotions like Laughter and Sadness and stuff like thats ??? i mean is it hard to implement ?

home: https://www.modelscope.cn/models/neuphonic/neutts-air onnx: https://www.modelscope.cn/models/neuphonic/neutts-air-onnx

[en.zip](https://github.com/user-attachments/files/22963308/en.zip) 语音内容是“方便”,识别结果却是“不方便”

Hello! I am initializing GigaAM v2 CTC using `sherpa-onnx.NewOfflineRecognizer()` inside docker container, here is the config: ```go recCfg.FeatConfig.SampleRate = 16000 recCfg.ModelConfig.Tokens = "path/to/tokens" recCfg.ModelConfig.NemoCTC.Model = "path/to/model.onnx" recCfg.ModelConfig.Provider = "cpu" recCfg.ModelConfig.NumThreads...