Video-LLaMA
Video-LLaMA copied to clipboard
Audio input
Hi, I have a question about audio input.
In "Video-LLaMA/video_llama/conversation/conversation_video.py line 255", I think the input of this function (load_and_transform_audio_data) should be an audio file (.wav), why is your input here a video file?
audio = load_and_transform_audio_data([video_path],"cpu", clips_per_video=8)