Gary Mulder
Gary Mulder
Probably a duplicate of #40?
1. Are you talking about eval set loss or training loss? 2. Plot both as a function of epoch similar to #63 to see whether you are overfitting or underfitting...
Without a plot it is difficult to say for certain, but you are probably overfitting. Don't train for more than one epoch.
@myeolinmalchi do you have a torrent for the 30B or 65B weights?