UnsupSeg
UnsupSeg copied to clipboard
wav length
Hi, I'm trying to use this work to predict the segments of audios. Then feature of this work is exacted by conv encoder, and the parameter wav_len is calculated with conv layers outputs. And the wav_len equals to the number of frames. When I used pretrained model to get segments with melscale, I found the number of frames were different between melscale and output of encoder. For example, the length of the audio is 258560, and length of the conv layers output is 1613, which is 1617 of melscale. How to avoid this difference?
I use torchaudio to calculate melscale, and set parameters like this win: 30ms hop: 10ms n_mel: 80