fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Do I need to crop long audio for inference based on pretrained models?

Open xuduo18311199384 opened this issue 1 year ago • 1 comments

I have a 5-minute audio file, and the wav2vec features obtained by direct inference and the wav2vec features obtained by cropping into a 10s segment are inconsistent. Is it possible that the accuracy of the results obtained by direct inference of long audio is low? So, how long audio should I crop to get the best result?

xuduo18311199384 avatar Oct 29 '24 07:10 xuduo18311199384