MagicianWu
MagicianWu
> > Was anyone able to find a solution to this problem? I'm also not able to resume from checkpoint, using deepspeed zero 3 > > Do you use DS...
I guess it is related to issue #80, I will update my repo and retrain on a small dataset.
> What reconstruction loss were you able to achieve with the auto-encoder? Both training and validation loss is reduced to around 0.34. > The purpose is to add a 'safety...
> You have commented over the attention heads and it's dim size which would default to 12 and 64 I just checked that the default value for attention heads(attn_heads) is...