Training problem

Open DrYangLiu opened this issue 6 years ago • 1 comments

@ConnorJL Thanks for the great work.

Unfortunately, I found out my training using OpenWebTextCorpus is too slow even for 117M model. The cross entropy loss function decreases rapidly before 10k steps using a batch size of 64. After that it stayed around 3.0. Is this a known phenomenon or is it a dataset problem? I found the loss function in model_fns is not shifted. It should be loss_batch = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output["logits"][:, :-1],labels=features[:, 1:]) , am I right?

Aug 05 '19 08:08 DrYangLiu

Unfortunately, this is a known phenomena, and I haven't been able to fix it. I perform the shifting of the labels in the input function (it's done in an ugly way, I'd do it differently now, but the effect should be the same). If I didn't shift, the model should converge to 0 loss very rapidly since it's just copying the input. I'm very open to any other ideas of what may be causing this problem. Maybe it is the dataset after all?

Aug 05 '19 08:08 ConnorJL