Vasudev Gupta comments

Results 20 comments of


Vasudev Gupta

Training Wav2Vec2 model on 100h & experiment-2

> I think to keep it tidy we could use this repo and once we have fixated on something we could incorporate that inside the GSoC repo. WDYT? Yeah! that...

Training Wav2Vec2 model on 100h & experiment-2

@sayakpaul, Above experiments are just normal fine-tuning wav2vec2 on 100h of LibriSpeech data. Since, training on 960h takes lot of time, I want to establish some kinda baseline for small...

Training Wav2Vec2 model on 100h & experiment-2

> Got it. But didn't we have models fine-tuned on the LibriSpeech dataset (100h) already? No, I directly trained on 960h earlier. > By two-stage, do you mean training of...

Training Wav2Vec2 model on 100h & experiment-2

Hello @sayakpaul, I trained the first distillation model yesterday. Unfortunately, it didn't perform well. It's trying to learn (not all predicted tokens are random). I am trying to change initialisation...

Training Wav2Vec2 model on 100h & experiment-2

Currently only for 10 epochs (logs: https://wandb.ai/7vasudevgupta/wav2vec2-distillation/runs/2h82mhgc?workspace=user-7vasudevgupta). I need to play around with alpha. Will do these experiments today.

Ideas from the wav2vec2 repo

### Questions * Does teacher model need to be trained with mixup to be able to apply mixup during knowledge distillation stage (function matching)? ### Major challenges * It's hard...

Ideas from the wav2vec2 repo

> This is actually a good research question to ask. In the vision literature, we generally always start with a good teacher model. [Noisy Student](https://arxiv.org/abs/1911.04252), MEAL ([v1](https://arxiv.org/abs/1812.02425), [v2](https://arxiv.org/abs/2009.08453)), etc. all...