MagicianWu comments

Repositories
Issues
Comments

Results 4 comments of


                                            MagicianWu

RunTimeError Missing keys while resuming training and cannot load checkpoint

> > Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3 > > Do you use DS...

Transformer keeps predicting the same token

I guess it is related to issue #80, I will update my repo and retrain on a small dataset.

Transformer keeps predicting the same token

> What reconstruction loss were you able to achieve with the auto-encoder? Both training and validation loss is reduced to around 0.34. > The purpose is to add a 'safety...

Transformer keeps predicting the same token

> You have commented over the attention heads and it's dim size which would default to 12 and 64 I just checked that the default value for attention heads(attn_heads) is...