Cui Junbo
Cui Junbo
try to use our new training code~
Hello, thank you for following our work, we will consider trying to support it in the future!
http://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf
1. Possible but new a lot of training~ 2. Try to read-> https://github.com/OpenBMB/MiniCPM-o?tab=readme-ov-file#general-speech-conversation-with-configurable-voices
try to use our new model~
try to use our new model~
http://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf
你好,很高兴你有微调的兴趣,audio到text的微调方案几乎和image到text的相差不大,修改成本比较小. 我们会在下周给出示例代码.
> > 你好,很高兴你有微调的兴趣,audio到text的微调方案几乎和image到text的相差不大,修改成本比较小. 我们会在下周给出示例代码. > > 我看到模型架构的audio encoder似乎与qwen是分离的,如果我的数据是有输入audio对应文本的,我是不是也可以直接去做text2text的sft 您好,这种方式可能会导致无法完成 音频输入情况下的对齐, 您可以尝试使用https://github.com/hiyouga/LLaMA-Factory/pull/6701来进行微调,已经支持 audio 2 text啦
@lihytotoro evaluation for multiple pictures, videos and audio https://github.com/OpenBMB/UltraEval-Audio evaluation for audio