Emotion-LLaMA 语音模态信息在训练过程中是否没有使用？

语音模态信息在训练过程中是否没有使用？

Open ASolitaryMan opened this issue 9 months ago • 3 comments

作者您好，我看了您推理的代码发现并没有音频模态的输入。请问是我理解错了吗，还是在其他地方有提及，请您赐教

Mar 05 '25 02:03 ASolitaryMan

推理代码指的是Demo吗？我们的推理代码输入是视频，然后从视频中提取audio，再把audio输入到HuBERT模型中提取特征。

我们的测试代码，是提前将audio输入到HuBERT模型提取特征npy文件，然后直接加载audio的特征进行测试。

Mar 05 '25 02:03 ZebangCheng

感谢您的回复，我在eval_emotion_EMER.py看到answers = model.generate(images, video_features, texts, max_new_tokens=max_new_tokens, do_sample=False)，我发现没有audio的输入，随发起了这个issue。如果涉及到audio的输入，能请您给一个代码路径吗？

Mar 05 '25 03:03 ASolitaryMan

https://github.com/ZebangCheng/Emotion-LLaMA/blob/20e30d68afac5b2af94e988cde50dfdec0e78e02/minigpt4/datasets/datasets/first_face.py#L177-L190

Mar 05 '25 03:03 ZebangCheng

Emotion-LLaMA Emotion-LLaMA copied to clipboard

语音模态信息在训练过程中是否没有使用？

Emotion-LLaMA
Emotion-LLaMA copied to clipboard