Alan Fang
Alan Fang
配置文件里面是有这个do_end_point_detection选项的,但我看e2e_vad.py中的代码,什么都没做直接pass了,所以比较好奇是其他地方做了,还是在计划内但还没有支持,提前感谢大佬解惑 '''vad.yaml vad_post_conf: sample_rate: 8000 detect_mode: 1 snr_mode: 0 max_end_silence_time: 800 max_start_silence_time: 3000 do_start_point_detection: True do_end_point_detection: False ''' 代码位置: https://github.com/alibaba-damo-academy/FunASR/blob/172e7ac986f299ad545cbd91a8cecc3ef967af36/funasr/models/e2e_vad.py#L414
https://github.com/alibaba-damo-academy/FunASR/blob/4854d398708594a13e3043daf1a19adfde970ea2/funasr/modules/lora/layers.py#L216 参考loralib的issue: https://github.com/microsoft/LoRA/issues/34 推理的时候没有用到eval(),所以权重没有合并进去 可以用他们的分支代码:https://github.com/microsoft/LoRA/tree/bugfix_MergedLinear 顺带着再贴个苏神的博客:https://kexue.fm/archives/9590
I'm confused about why T(w) is the inner function https://github.com/microsoft/LoRA/blob/dc5d1744fa9430edda10bc233a9efc65e9239f50/loralib/layers.py#L128 torch.jit.script will report error: torch.jit.frontend.UnsupportedNodeError: function definitions aren't supported: def forward(self, x: torch.Tensor): def T(w): ~~~ 0 and not self.merged:...
参考这篇谷歌的论文:https://arxiv.org/pdf/2305.15663.pdf 看起来只是改了一层conformer的fc层,加了个MOE模块 @Mddct 周神有啥看法 详细的训练策略有待研究(是否需要冻结参数?),论文看的我有点懵,如果有大佬指导下就更好了(respect) 加个知乎文章: https://zhuanlan.zhihu.com/p/671873012
LoRA support references: https://github.com/microsoft/LoRA/tree/bugfix_MergedLinear https://kexue.fm/archives/9590 https://github.com/huggingface/peft (I think PEFT methods used in NLP all have the potential to tune the ASR model) LoRA experiment: gpus: 4*3090 lora_list in encoder |...
vad pipeline result dict , key is not matched with value, I found the code made the problem, why i=0 add the "text" key? Here is the code: https://github.com/modelscope/modelscope/blob/a67d339e3bf8abd25d224818f6fd6512078f6f37/modelscope/pipelines/audio/voice_activity_detection_pipeline.py#L157C9-L157C9
大模型重复生成问题 推理层面优化: repetition penalty 训练层面优化: eos_token: https://github.com/QwenLM/Qwen2/issues/779#issuecomment-2229890369 no_speech token: https://github.com/X-LANCE/SLAM-LLM/issues/113 模型帧率,提高帧率可以改善短音频复读机问题 LLM的文本分布 引入ctc结果:https://arxiv.org/abs/2408.09491 从NLP的角度: https://zhuanlan.zhihu.com/p/672261242?utm_psn=1807773013061558274 训练数据中短文本或重复文本较多,即数据多样性不足时会触发大模型重复生成问题 模型参数量越小越容易触发大模型重复生成问题 欢迎补充!
### 🚀 The feature, motivation and pitch Hello, there, I believe we need a data filtering mechanism to handle excessively long or short data. Could you please share any simple...