Yuan Gong

Results 80 comments of Yuan Gong

I see. It might be caused by a bug in the code. I didn't consider your use case. If the model is not too large, can you send the .pth...

The problem is that I used a trick to encode the pretraining hyperparameters in the model and use the existence of the hyperparameter to check if the model is a...

Hi there, I guess you are correct that `dim` should be 0 to match with the equ(1) of the paper. https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/models/ast_models.py#L98-L99 My guess is that impacts `correct` (which is not...

Thanks for the good catch - and good luck with your research!!

I am wondering if modifying these two lines and using `finetuningavgtok` as the `task` in `forward` could help your application (basically comment out the mlp head)? https://github.com/YuanGongND/ssast/blob/b589c9c6eb744fe8d05340169ed36e46e8c19ba1/src/models/ast_models.py#L262-L264 Btw, I would...

Hi there, The goal of reconstruction loss here is just to force the model to learn a good audio representation. We didn't mean to make the model a strong reconstructor....

Hi there, I am not familiar with vocoder - you can check the github list: https://github.com/topics/vocoder. Note most of these are for TTS (speech) rather than general audio. -Yuan

Hi there, We haven't tried not using normalization for SSAST. The reason we do normalization for AST is we want to use ImageNet pretrained model, which is trained with normalized...

Thanks! I have read the paper but might need to read it more carefully. Which specific thing is counter with your experiment? Do you mean the model performance is sensitive...

I see. In the last paragraph of the paper: > In our experiment, Training AST using input of var(x) = 1 or var(x) = 0.0625 would lead to mAP of...