Results 2 issues of SARTHAK JAIN

Thanks for the great blog post, It was quite helpful for keeping track of what's happening in the paper. Currently in the section "Discrete-time SSM: The Recurrent Representation" in the...

Hi I was wondering if there was a way to turn the dropout and layer-norm layers in BERT to eval mode during training when we set the requires_grad parameter to...

Contributions welcome