Yuan Gong comments

Results 80 comments of


Yuan Gong

Poor performance of AST on audio clips with different lengths

The trick is we trim or interpolate the positional embedding https://github.com/YuanGongND/ast/blob/7b2fe7084b622e540643b0d7d7ab736b5eb7683b/src/models/ast_models.py#L141-L147 To use the AudioSet pretrained model, you just need to specify the `t_dim` when you initialize the AST model,...

[Bug] ModuleNotFoundError: No module named 'torch.ao'

Hi there, Did we use `torchvision` somewhere in the repo? Could you point me to that? Thanks! -Yuan

[Bug] ModuleNotFoundError: No module named 'torch.ao'

Thanks! I just checked, it seems we use `torchvision==0.10.0+cu102` ``` >>> torch.__version__ '1.8.1+cu102' >>> torchaudio.__version__ '0.8.1' >>> torchvision.__version__ '0.10.0+cu102' ``` It's weird that no one has complained about this issue....

[Bug] ModuleNotFoundError: No module named 'torch.ao'

Yes, but I will need to test it before changing it. Thanks again for pointing this out.

AST tiny and small pretrained models

Hi Annalisa, Unfortunately, I don't have tiny&small ImageNet+AudioSet pretrained models. The [SSAST repo](https://github.com/YuanGongND/ssast) has in-domian pretrained model of all sizes but it is based on a different pretraining scheme. Without...

Trouble with pure inference

Thanks for the kind words. We use multiple GPU to train the model, so the model is that torch.nn.Dataparallel object. Even though you want to do single GPU inference, you...

Trouble with pure inference

Also the model input should be a spectrogram that is processed with the same normalization and feature extraction function https://github.com/YuanGongND/ssast/blob/35ae7abbdd2870c008feed4ece8b7c6457421b17/src/dataloader.py#L195 and https://github.com/YuanGongND/ssast/blob/35ae7abbdd2870c008feed4ece8b7c6457421b17/src/dataloader.py#L126-L127. You can also refer to https://github.com/YuanGongND/ast/blob/master/egs/audioset/inference.py.

Trouble with pure inference

You should do something like this: https://github.com/YuanGongND/ast/blob/7b2fe7084b622e540643b0d7d7ab736b5eb7683b/egs/audioset/inference.py#L82-L89 i.e., `audio_model.load_state_dict(checkpoint)` after convert it to Dataparallel object.

Trouble with pure inference

I don't suggest changing `ast_models.py`. Somehting like below should work: ```python input_tdim = 1024 ast_mdl = ASTModel(label_dim=2, fshape=16, tshape=16, fstride=10, tstride=10, input_fdim=128, input_tdim=input_tdim, model_size='tiny', pretrain_stage=False, load_pretrained_mdl_path=MODEL) # convert it to...

Trouble with pure inference

That's weird, if you use my recipe to fine-tune the model, the saved model should be already a dataparallel object.