ast icon indicating copy to clipboard operation
ast copied to clipboard

Some questions about the details of AST.

Open TungyuYoung opened this issue 3 years ago • 1 comments

I would like to know how to explain the classification of audio that can be achieved using ImageNet pretrained models based on spectrograms? As we all know, most of the pictures included in Imagenet are common photos of daily life, such as cats, dogs, cars, etc. Are the features of these pictures/objects correlated with the audio spectrogram? Why can the knowledge learned from traditional pictures be distilled into the classification of spectrograms?

I would appreciate it if you could answer my questions.

TungyuYoung avatar Sep 17 '22 14:09 TungyuYoung

Hi there,

This is an interesting question but I don't have a clear answer. It is worth note that using IN pretraining for audio tasks is not new for AST, but can be trace back to 2014.

-Yuan

YuanGongND avatar Oct 09 '22 04:10 YuanGongND