shuaijiang comments

Results 31 comments of


                                            shuaijiang

Whether cuda10,pytorch 1.0 is supported

You can use ` pip install warpctc-pytorch10-cuda90==0.1.3` instead of `"python setup.py install`, note the right version of warpctc-pytorch10-cuda90

Need the abillity to save/re-use a generated voice

Parler-TTS generate a similar but different voice with same discription but different Transcript text

用BELLE-2/Belle-whisper-large-v2-zh识别中文音频，效果还不如Systran/faster-whisper-large-v2？

根据上面结果，大概原因可能是使用belle-whisper没有做vad切分，所以都是按照最长30秒做的识别，这样有一定的影响。建议把belle-whisper转为fasterwhisper模型格式，基于faster-whisper框架去做推理，faster-whisper内置了vad 模块。速度和效果都有一定保证。

用BELLE-2/Belle-whisper-large-v2-zh识别中文音频，效果还不如Systran/faster-whisper-large-v2？

你说的应该是 timestamps， belle-whisper 微调时没有进一步优化timestamp。如果需要timestamps需要在推理时主动打开。faster-whisper框架有vad，切分效果更好一些。所以建议用faster-whisper框架调用belle-whisper

关于Belle-whisper-large-v2-zh模型分句的问题，请问这个模型是用有时间戳的分句数据训练的吗？

数据处理默认训练数据均为短句（大多10秒内），所以没有做细致的切分。识别过程中，分句主要依赖vad模块。由于微调过程没有进行加时间戳的微调，可能会影响识别结果中时间戳的准确率。如果对时间戳准确率有要求，可以微调中加入时间戳微调。

百度为啥只开源英文的模型，不开源中文的，就不能为国内生态做点贡献吗？

here is the PLATO-XL model https://dialogue.bj.bcebos.com/Knover/projects/PLATO-XL/11B.tar

can we fine-tunning on belle-whisper model

you can continually all parameters fine-tune using https://github.com/shuaijiang/Whisper-Finetune/blob/master/finetune_all.py, or fine-tune by lora using https://github.com/shuaijiang/Whisper-Finetune/blob/master/finetune.py

关于Belle-whisper-large-v2-zh模型分句的问题，请问这个模型是用有时间戳的分句数据训练的吗？

时间戳可以参考 faster whisper https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file#word-level-timestamps

belle-whisper model take much more time even after transformed by ctranslate

it confused me. Belle-whisper is exactly same to whisper on model framework. BTW, check the output length of belle-whisper and faster-whisper, maybe the length difference make the speed gap

large-v3-zh中文的效果变得更差了

你调用的方式是什么？可以将belle-whisper-large-v3-zh 转成 CT格式，基于faster-whisper 来推理