Alan Fang issues

Results 8 issues of


                                            Alan Fang

fsmn-vad是否支持endpoint检测呢

配置文件里面是有这个do_end_point_detection选项的，但我看e2e_vad.py中的代码，什么都没做直接pass了，所以比较好奇是其他地方做了，还是在计划内但还没有支持，提前感谢大佬解惑 '''vad.yaml vad_post_conf: sample_rate: 8000 detect_mode: 1 snr_mode: 0 max_end_silence_time: 800 max_start_silence_time: 3000 do_start_point_detection: True do_end_point_detection: False ''' 代码位置: https://github.com/alibaba-damo-academy/FunASR/blob/172e7ac986f299ad545cbd91a8cecc3ef967af36/funasr/models/e2e_vad.py#L414

LoRA模块有bug

https://github.com/alibaba-damo-academy/FunASR/blob/4854d398708594a13e3043daf1a19adfde970ea2/funasr/modules/lora/layers.py#L216 参考loralib的issue: https://github.com/microsoft/LoRA/issues/34 推理的时候没有用到eval()，所以权重没有合并进去可以用他们的分支代码:https://github.com/microsoft/LoRA/tree/bugfix_MergedLinear 顺带着再贴个苏神的博客：https://kexue.fm/archives/9590

bug

T(w) problem

I'm confused about why T(w) is the inner function https://github.com/microsoft/LoRA/blob/dc5d1744fa9430edda10bc233a9efc65e9239f50/loralib/layers.py#L128 torch.jit.script will report error: torch.jit.frontend.UnsupportedNodeError: function definitions aren't supported: def forward(self, x: torch.Tensor): def T(w): ~~~ 0 and not self.merged:...

MOE支持多语种识别的问题

参考这篇谷歌的论文:https://arxiv.org/pdf/2305.15663.pdf 看起来只是改了一层conformer的fc层，加了个MOE模块 @Mddct 周神有啥看法详细的训练策略有待研究(是否需要冻结参数？)，论文看的我有点懵，如果有大佬指导下就更好了(respect) 加个知乎文章: https://zhuanlan.zhihu.com/p/671873012

LoRA support

LoRA support references: https://github.com/microsoft/LoRA/tree/bugfix_MergedLinear https://kexue.fm/archives/9590 https://github.com/huggingface/peft (I think PEFT methods used in NLP all have the potential to tune the ASR model) LoRA experiment: gpus: 4*3090 lora_list in encoder |...

enhancement

VAD result problem

vad pipeline result dict , key is not matched with value, I found the code made the problem, why i=0 add the "text" key? Here is the code: https://github.com/modelscope/modelscope/blob/a67d339e3bf8abd25d224818f6fd6512078f6f37/modelscope/pipelines/audio/voice_activity_detection_pipeline.py#L157C9-L157C9

repetition 问题记录

大模型重复生成问题推理层面优化： repetition penalty 训练层面优化: eos_token: https://github.com/QwenLM/Qwen2/issues/779#issuecomment-2229890369 no_speech token: https://github.com/X-LANCE/SLAM-LLM/issues/113 模型帧率，提高帧率可以改善短音频复读机问题 LLM的文本分布引入ctc结果：https://arxiv.org/abs/2408.09491 从NLP的角度： https://zhuanlan.zhihu.com/p/672261242?utm_psn=1807773013061558274 训练数据中短文本或重复文本较多，即数据多样性不足时会触发大模型重复生成问题模型参数量越小越容易触发大模型重复生成问题欢迎补充！

data filter mechanism

### 🚀 The feature, motivation and pitch Hello, there, I believe we need a data filtering mechanism to handle excessively long or short data. Could you please share any simple...