Yuan Gong

Results 80 comments of Yuan Gong

The trick is we trim or interpolate the positional embedding https://github.com/YuanGongND/ast/blob/7b2fe7084b622e540643b0d7d7ab736b5eb7683b/src/models/ast_models.py#L141-L147 To use the AudioSet pretrained model, you just need to specify the `t_dim` when you initialize the AST model,...

Hi there, Did we use `torchvision` somewhere in the repo? Could you point me to that? Thanks! -Yuan

Thanks! I just checked, it seems we use `torchvision==0.10.0+cu102` ``` >>> torch.__version__ '1.8.1+cu102' >>> torchaudio.__version__ '0.8.1' >>> torchvision.__version__ '0.10.0+cu102' ``` It's weird that no one has complained about this issue....

Yes, but I will need to test it before changing it. Thanks again for pointing this out.

Hi Annalisa, Unfortunately, I don't have tiny&small ImageNet+AudioSet pretrained models. The [SSAST repo](https://github.com/YuanGongND/ssast) has in-domian pretrained model of all sizes but it is based on a different pretraining scheme. Without...

Thanks for the kind words. We use multiple GPU to train the model, so the model is that torch.nn.Dataparallel object. Even though you want to do single GPU inference, you...

Also the model input should be a spectrogram that is processed with the same normalization and feature extraction function https://github.com/YuanGongND/ssast/blob/35ae7abbdd2870c008feed4ece8b7c6457421b17/src/dataloader.py#L195 and https://github.com/YuanGongND/ssast/blob/35ae7abbdd2870c008feed4ece8b7c6457421b17/src/dataloader.py#L126-L127. You can also refer to https://github.com/YuanGongND/ast/blob/master/egs/audioset/inference.py.

You should do something like this: https://github.com/YuanGongND/ast/blob/7b2fe7084b622e540643b0d7d7ab736b5eb7683b/egs/audioset/inference.py#L82-L89 i.e., `audio_model.load_state_dict(checkpoint)` after convert it to Dataparallel object.

I don't suggest changing `ast_models.py`. Somehting like below should work: ```python input_tdim = 1024 ast_mdl = ASTModel(label_dim=2, fshape=16, tshape=16, fstride=10, tstride=10, input_fdim=128, input_tdim=input_tdim, model_size='tiny', pretrain_stage=False, load_pretrained_mdl_path=MODEL) # convert it to...

That's weird, if you use my recipe to fine-tune the model, the saved model should be already a dataparallel object.