SpecAugment icon indicating copy to clipboard operation
SpecAugment copied to clipboard

Error using the shape of spectrogram

Open haojun opened this issue 6 years ago • 4 comments

E.g, line 62 in spec_augment_tensorflow.py: ''' fbank_size = tf.shape(spectrogram) n, v = fbank_size[1], fbank_size[2] ''' And 'n' is used as the length of time, and 'v' is used as the length of frequency.

But in spec_augment_test_TF.py, the re-shaped mel_spectrogram from librosa should be (-1, n_mels, t, 1), which means fbank_size[1] is actually the length of frequency and fbank_size[2] is the length of time.

Was I wrong or did I miss something?

haojun avatar Sep 29 '19 07:09 haojun

I have the same question about it

Jxu-Thu avatar Dec 30 '19 09:12 Jxu-Thu

Hi, do you solve this question?

JunenuJ avatar Mar 25 '20 05:03 JunenuJ

To me it looks like all the dimensions are in the wrong order for the tensorflow script at least. For me the script does the time warp on the frequency axis for instance. An easy fix I think could be to do a transpose of the spectrogram, pass it to the program and then transpose it again, though I haven't tried it

philippgovernale avatar Feb 09 '22 23:02 philippgovernale

I have uploaded a gist that swaps all the dimensions here

philippgovernale avatar Feb 09 '22 23:02 philippgovernale