[question] is there a best practice to sample audio input?
Love your amazing work! I'm trying to inference on audio input with your model, and facing OOM when dealing with long audio input.
I wanted to know if there is a best/frequently used practice to sample audio inputs, as uniform frame sampling for videos. Especially, I wonder do you think MiniCPM-o-2.6 will work for sampled audios as well.
I tried to find your code if there is audio sampling, but I couldn't. Thank you for so your consideration. :)
Generally speaking, we don't recommend sampling audio input, as this can severely compromise the semantic integrity of the audio.
Please send me the length of the audio you need to process, and I'll provide specific suggestions.