Overfit

Results 16 comments of Overfit

> Hmm interesting.. Is this the result of 0.0.1a4 version? > And How did you guys print out that result? I try using 0.0.1a4 and the result is the same

> Probably the criterion loss function is the problem. > > ```python > # shape [10, 2], not very accurate output > out = torch.tensor([[ -8.4014, -0.0002], > [-10.3151, -0.0000],...

@cairoHy after the modification, the model can't converge. Any suggestions?

@codertimo The loss just don't converge ![image](https://user-images.githubusercontent.com/8109984/47472994-511c0780-d843-11e8-84e4-581ce34196dc.png)

I remove dropout in all layers and now my model success to converge. Maybe dropout in all layers is too big a regularization for small datasets? Or there is something...

I set parameter `--seq_len` to 32

my parameter settings is as follows, and I set next_setence loss's weight to be 5(It should be annealed, or set to 1 I think). I only have about 10000 sentence...

I've tried some varied parameters and it seems that on my dataset, these parameter doesn't have much impact. Only dropout is critical. But my dataset is rather small. I choose...

And this is roughly the whole training log. The accuracy seems to be stuck at 81% finally. [Uploading _gaiastack_log_stdout (3).log…]()

I think this is a bug. And the problem is that in vocab.y the 127th line ` words = line.replace("\n", "").replace("\t", "").split() ` \t is replaced by "". I think...