Vasudev Gupta

Results 20 comments of Vasudev Gupta

> I think to keep it tidy we could use this repo and once we have fixated on something we could incorporate that inside the GSoC repo. WDYT? Yeah! that...

@sayakpaul, Above experiments are just normal fine-tuning wav2vec2 on 100h of LibriSpeech data. Since, training on 960h takes lot of time, I want to establish some kinda baseline for small...

> Got it. But didn't we have models fine-tuned on the LibriSpeech dataset (100h) already? No, I directly trained on 960h earlier. > By two-stage, do you mean training of...

Hello @sayakpaul, I trained the first distillation model yesterday. Unfortunately, it didn't perform well. It's trying to learn (not all predicted tokens are random). I am trying to change initialisation...

Currently only for 10 epochs (logs: https://wandb.ai/7vasudevgupta/wav2vec2-distillation/runs/2h82mhgc?workspace=user-7vasudevgupta). I need to play around with alpha. Will do these experiments today.

### Questions * Does teacher model need to be trained with mixup to be able to apply mixup during knowledge distillation stage (function matching)? ### Major challenges * It's hard...

> This is actually a good research question to ask. In the vision literature, we generally always start with a good teacher model. [Noisy Student](https://arxiv.org/abs/1911.04252), MEAL ([v1](https://arxiv.org/abs/1812.02425), [v2](https://arxiv.org/abs/2009.08453)), etc. all...

> I see. But when we are distilling a teacher into a student, the student needs to be trained from scratch right? With respect to this context, I am not...

Hey @sayakpaul, sorry for keeping this project on hold earlier. I will try to put initial detailed plan by tomorrow and then we can have more discussion on that (or...

### Experiment-1 Distillation of pre-trained model Original Wav2Vec2 was pre-trained with one head on the top (Note: we don’t have code to that head yet), so if we want to...