ImageBind icon indicating copy to clipboard operation
ImageBind copied to clipboard

Varying the sound length

Open datovar4 opened this issue 2 years ago • 5 comments

Fantastic work! I have been evaluating the model using sound files of different lengths. For sounds shorter (500ms in this example) than the 2 second audio clips used to train, I get the following warning: WARNING:root:Large gap between audio n_frames(48) and target_length (204). Is the audio_target_length setting correct?

My question is how do sound clips of varying length affect the embedding output? In other words, can I still use embeddings from shorter clips, or should I duplicate shorter sounds to approximate the 2 seconds expected by the model?

datovar4 avatar May 11 '23 04:05 datovar4

Yes. I have the same question. Maybe padding zero vectors in the end. But I do not know whether such a process will affect the performance.

zeroQiaoba avatar Jul 11 '23 13:07 zeroQiaoba

I have similar questions related to this

seungkim1313 avatar Jul 13 '23 04:07 seungkim1313

Same question

ospanbatyr avatar Sep 10 '23 08:09 ospanbatyr

I have the same question.

cococo2000 avatar Apr 12 '24 08:04 cococo2000

Has anyone figured out any answer to this? maybe through some empirical experiments even?

lzolyomi avatar May 16 '24 19:05 lzolyomi