Yuan Gong comments

Results 80 comments of


Yuan Gong

About start training: IndexError: tuple index out of range.

I cannot tell the reason either. But there's no RGB concept in the audio spectrogram. It is just 1-d information. [128,1024] means 128 frequency bins, 1024 time frames, which looks...

About start training: IndexError: tuple index out of range.

The length depends on `input_tdim`, for your case, you should modify `run.py` to set `input_tdim=250`. `timem` should be smaller than `input_tdim`. Again, I suggest starting from either the speechcommands or...

About start training: IndexError: tuple index out of range.

OK, I finally find the reason. This is due to a `torchaudio` issue. We use `torchaudio 0.8.1`, in which the input of the masking can be [freq, time] while the...

About start training: IndexError: tuple index out of range.

You can use the Colab script to find the bug https://colab.research.google.com/github/YuanGongND/ast/blob/master/colab/torchaudio_SpecMasking_1_1.ipynb

Different train-(val/test) spectogram shape (recordings duration)

Hi there, Thanks for your interest. The transformer itself accepts variable-length input, but that requires some engineering (e.g., bucketing sequence with similar lengths). We didn't implement it in the code,...

Different train-(val/test) spectogram shape (recordings duration)

Hi Daniel, There are a few things. > I don't understand what you mean by 'majority voting' in my test set, but I'll just decide on an audio length for...

Different train-(val/test) spectogram shape (recordings duration)

Hi Daniel, > I have been trying different stuff and indeed in some cases AST outperforms my current model (not really when resampling at 16K and/or using audioset pretrain tho,...

Different train-(val/test) spectogram shape (recordings duration)

> I just have some question though, does the teacher model needs to be already trained when using it through the KD training process? We always use pretrained teacher because...

some question about Deit's two [cls] token processing.

To use DEIT initialization, we have to initialize in the same way as DEIT, but as you point out, we average it in the forward pass. Good luck with your...

Validation loss vs Training loss in AudioSet training

Thanks for your interest. I think it is not an overfitting issue as you should also see a performance drop in mAP or accuracy on the validation set if the...