ast
ast copied to clipboard
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
![捕获](https://user-images.githubusercontent.com/88529547/182419468-7326edfd-9d7c-404b-b314-c20e63ef67df.PNG)
Hi, sorry to bother you. Why are the two special [CLS]tokens in DeiT said to be average as a single [CLS] token in the paper, but in the code I...
Thanks for making such a wonderful repo. When I run the model, the validation loss is very high(around 0.6940), but the mAP is keep increasing normally. Could you please explain...
Hi Yuan, I have tested your AST mode pretrained on Audioset on my own dataset and I noticed that it achieves similar performance as EfficientNet pretrained on Audioset using psla...
Avoid an incompatible torchvision dependency on torch 1.11.0 Fixes #68
Following setup instructions at present will lead to an error on the model import. Specifically, `torchvision` needs to be pinned to a version. Otherwise, `pip` may install (e.g.) 0.12.0, which...
Hello! First of all, congratulations on your amazing work. I'm doing my MSc Thesis on audio classification (respiratory disease diagnosis from lung sounds). My main objective is to improve the...
Hi For audio of different lengths, the padding operation in dataset is taken on `fbank`. So why not padding on waveform first and then convert it to `fbank`.
Hi, Dr.Gong, I use AST on my own dataset. I have created the .json file and .csv file according to the guide. However, when I run run.sh, an error occured...
Hi, YuanGong, Does it support distribution training with multi GPU on different machines? looking forward to you reply, thanks!