meshed-memory-transformer
meshed-memory-transformer copied to clipboard
Random output after several early epoch then start training
Hi @marcellacornia,
When I started my train, I got random outputs for about the first five epochs, I mean it generated words. Then, it produced nothing, and I had to train for several epochs to get good results. Do you have any idea? Because of initialization?