RetroMAE
RetroMAE copied to clipboard
Dupmae for modernbert
Hello! Are there any plans for Retro/Dupmae implementation for modernbert pre-training? I was able to change couple of argument to start training for Modernbert-base, however grad_norm and loss values are stuck at 0/nan, so it seems harder to implement. Any advice appreciated.