SpecAugment
SpecAugment copied to clipboard
Error using the shape of spectrogram
E.g, line 62 in spec_augment_tensorflow.py: ''' fbank_size = tf.shape(spectrogram) n, v = fbank_size[1], fbank_size[2] ''' And 'n' is used as the length of time, and 'v' is used as the length of frequency.
But in spec_augment_test_TF.py, the re-shaped mel_spectrogram from librosa should be (-1, n_mels, t, 1), which means fbank_size[1] is actually the length of frequency and fbank_size[2] is the length of time.
Was I wrong or did I miss something?
I have the same question about it
Hi, do you solve this question?
To me it looks like all the dimensions are in the wrong order for the tensorflow script at least. For me the script does the time warp on the frequency axis for instance. An easy fix I think could be to do a transpose of the spectrogram, pass it to the program and then transpose it again, though I haven't tried it
I have uploaded a gist that swaps all the dimensions here