OneLLM icon indicating copy to clipboard operation
OneLLM copied to clipboard

Inference inputs multiple modalities other than text at once

Open xxrbudong opened this issue 1 year ago • 3 comments

Hello, I would like to ask, the current code seems to support only one modality and text modality at a time of inference, is it possible to input multiple modal data (such as audio, video and text) at a time of inference?

xxrbudong avatar May 24 '24 08:05 xxrbudong

The current model is not trained on joint multimodal data, so it may not perform well at the test time.

csuhan avatar Jul 08 '24 03:07 csuhan

The current model is not trained on joint multimodal data, so it may not perform well at the test time. But I see you run the test on Music-AVQA in thesis, could u tell me how you manage to use three modalities to generate answers?Thank u very much!

Cece1031 avatar Jul 28 '24 06:07 Cece1031

Hi @Cece1031 , hope the script in https://github.com/csuhan/OneLLM/issues/29 can help you.

csuhan avatar Nov 17 '24 07:11 csuhan