Yuan Gong comments

Results 80 comments of


Yuan Gong

Trouble with pure inference

I see. It might be caused by a bug in the code. I didn't consider your use case. If the model is not too large, can you send the .pth...

Trouble with pure inference

The problem is that I used a trick to encode the pretraining hyperparameters in the model and use the existence of the hyperparameter to check if the model is a...

Hi there, I guess you are correct that `dim` should be 0 to match with the equ(1) of the paper. https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/models/ast_models.py#L98-L99 My guess is that impacts `correct` (which is not...

about nce loss cal

Thanks for the good catch - and good luck with your research!!

Embeddings without fine tuning

I am wondering if modifying these two lines and using `finetuningavgtok` as the `task` in `forward` could help your application (basically comment out the mlp head)? https://github.com/YuanGongND/ssast/blob/b589c9c6eb744fe8d05340169ed36e46e8c19ba1/src/models/ast_models.py#L262-L264 Btw, I would...

How to convert fbank features back to audio ?

Hi there, The goal of reconstruction loss here is just to force the model to learn a good audio representation. We didn't mean to make the model a strong reconstructor....

How to convert fbank features back to audio ?

Hi there, I am not familiar with vocoder - you can check the github list: https://github.com/topics/vocoder. Note most of these are for TTS (speech) rather than general audio. -Yuan

Dataset mean / stdev normalization

Hi there, We haven't tried not using normalization for SSAST. The reason we do normalization for AST is we want to use ImageNet pretrained model, which is trained with normalized...

Dataset mean / stdev normalization

Thanks! I have read the paper but might need to read it more carefully. Which specific thing is counter with your experiment? Do you mean the model performance is sensitive...

Dataset mean / stdev normalization

I see. In the last paragraph of the paper: > In our experiment, Training AST using input of var(x) = 1 or var(x) = 0.0625 would lead to mAP of...