AudioMAE icon indicating copy to clipboard operation
AudioMAE copied to clipboard

VIT-L checkpoint and reproducing the visualization results

Open i-need-sleep opened this issue 1 year ago • 3 comments

Hello,

Thanks for the great repo.

I am trying to reproduce the visualization results in the paper for the reconstructed spectrograms. Following the demo notebook and using the pretrained ViT-B checkpoint, the results I got (see attached) are notably worse than those reported in the paper.

I note that the visualizations in the paper are based on the larger ViT-L model. Is it possible for you to share the pretrained checkpoint?

Additionally, can you confirm whether the model configuration used in the notebook is correct?

Thanks in advance!

masked (Masked with a ratio of 0.3) recons_pasted (Reconstructed patches)

i-need-sleep avatar Aug 10 '23 13:08 i-need-sleep

Hello,

Thanks for the great repo.

I am trying to reproduce the visualization results in the paper for the reconstructed spectrograms. Following the demo notebook and using the pretrained ViT-B checkpoint, the results I got (see attached) are notably worse than those reported in the paper.

I note that the visualizations in the paper are based on the larger ViT-L model. Is it possible for you to share the pretrained checkpoint?

Additionally, can you confirm whether the model configuration used in the notebook is correct?

Thanks in advance!

masked (Masked with a ratio of 0.3) recons_pasted (Reconstructed patches)

Hello, I have faced the similar issue. Have you managed to resolve it? I would appreciate it if you could share any insights or solutions you might have. Thank you!

shirly-24 avatar Dec 04 '23 03:12 shirly-24

I got in touch with one of the authors. This seems to be the expected behaviour of the ViT-B checkpoint.

i-need-sleep avatar Dec 04 '23 05:12 i-need-sleep

Hi, do you have access to ViT-L checkpoint? I am also looking for this.

wsntxxn avatar Mar 27 '24 17:03 wsntxxn