SLAM-LLM
SLAM-LLM copied to clipboard
Speech, Language, Audio, Music Processing with Large Language Model
### System Info torch 2.0.1 torchaudio 2.0.2 torchvision 0.15.2 ### Information - [ ] The official example scripts - [ ] My own modified scripts ### 🐛 Describe the bug...
### System Info torch 2.0.1 torchaudio 2.0.2 torchvision 0.15.2 ### Information - [ ] The official example scripts - [ ] My own modified scripts ### 🐛 Describe the bug...
Hello, I don't quite understand why bos is not added here "(example = prompt + answer # FIX(MZY): avoid putting a bos token before answer.)". How can autoregressive training be...
This open source project is fantastic. Is it convenient to inform which data was used to train vallex?
### 🚀 The feature, motivation and pitch like Adaptive noise suppression,Acoustic echo cancellation,Speech Seperation task, thanks! ### Alternatives _No response_ ### Additional context _No response_
### 🚀 The feature, motivation and pitch As we all know, GPT-4o is an end2end multi-modal models, which support Speech to Text/Speech. I have some ideas about it: 1. Speech...
Great work! More info about the data would be appreciated!
大模型重复生成问题 推理层面优化: repetition penalty 训练层面优化: eos_token: https://github.com/QwenLM/Qwen2/issues/779#issuecomment-2229890369 no_speech token: https://github.com/X-LANCE/SLAM-LLM/issues/113 模型帧率,提高帧率可以改善短音频复读机问题 LLM的文本分布 引入ctc结果:https://arxiv.org/abs/2408.09491 从NLP的角度: https://zhuanlan.zhihu.com/p/672261242?utm_psn=1807773013061558274 训练数据中短文本或重复文本较多,即数据多样性不足时会触发大模型重复生成问题 模型参数量越小越容易触发大模型重复生成问题 欢迎补充!
Hi there, I am interested in using more than 1 encoder on my speech tasks, does this framework support this feature? Currently in SLAM paper I see only one speech...
### System Info Pytorch 2.3.1+cu121 CUDA 12.2 GPU Nvidia H100 2 machines * 8, DDP only, FP16 ### Information - [ ] The official example scripts - [X] My own...