Thomas Ressler-Antal
Thomas Ressler-Antal
Okay, so I've plotted the teacher and student variances from two test runs with a small dataset and only for a few epochs: https://imgur.com/a/dynF7P9 The student variance begins to converge...
> > > > > Okay, so I've plotted the teacher and student variances from two test runs with a small dataset and only for a few epochs: > >...
Was there also instance_norm applied to the targets at the reduced speech setup @alexeib ?
Thanks for the reply! I am looking for the pre-training run(s), i.e. the one trained on Librispeech + AudioSet.
Bump! Sorry to get back to the question. But I am currently trying to implement SSAST (well, its masked autoencoder counterpart, see https://arxiv.org/pdf/2203.16691.pdf) for music. For efficiency reasons, I tried...
> I would suggest adding an option in the setting to set the Combined view aspect ratio and Presentation:Camera ratio. For example, 21:9 for The Combined view, and set Presentation:Camera...
> Duplicate of #2006 Ah sorry, can be closed.
Thanks for the reply. It was indeed slightly inconvenient. I did not expect this behavior and it caused problems with my transform pipeline further down the line. I ended up...
> Hi @Darius-H. Yeah, it could be directly used for text conditioning. For example, if you encode the input text with a pre-trained LLM, you can pool the output text...
Hi, Sorry to bring up a one year old issue. However, I am unsure about an implementation detail. https://github.com/lxtGH/CAE/blob/d72597143e486d9bbbaf1e3adc4fd9cfa618633a/models/modeling_cae.py#L114C58-L114C58 Here you update the teacher **after** computing the latent targets. Doesn't...