sgrnn icon indicating copy to clipboard operation
sgrnn copied to clipboard

Training diverges

Open carlthome opened this issue 7 years ago • 3 comments

I ran the README.md example and training seems to consistently diverge. I'm sure I've got something wrong but before I start debugging, have you managed to reproduce the PTB results of the paper with your implementation?

carlthome avatar Mar 17 '18 17:03 carlthome

I did not try to reproduce the PTB results, but I've tried it on some other problems. The model just converges as expected. You might want to look into other parts of your system.

hannw avatar Mar 17 '18 20:03 hannw

What other problems have you gotten your implementation to converge with?

I'm just running your example (Python 3.6, TensorFlow 1.6, NVIDIA TITAN X, CUDA 9 with CuDNN):

git clone [email protected]:hannw/sgrnn.git
cd sgrnn
pip install .

wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
tar xvf simple-examples.tgz
rm simple-examples.tgz

python sgrnn/main.py --model=small --data_path=simple-examples/data/ \
    --num_gpus=0 --rnn_mode=BASIC --save_path=/tmp/sgrnn

and unfortunately the training diverges every time

Epoch: 1 Learning rate: 1.000
0.001 perplexity: 3462.042 speed: 628 wps
0.101 perplexity: 8128272430211485696.000 speed: 10973 wps
0.201 perplexity: 120207729490494550265823232.000 speed: 12286 wps
0.301 perplexity: 13565869158556072395952619520000.000 speed: 12739 wps
0.401 perplexity: 56232215737478433981029280594788352.000 speed: 12937 wps
0.501 perplexity: 12288526736969419778825356413524508672.000 speed: 13170 wps
0.601 perplexity: 632620625034758550604580899158593896448.000 speed: 13319 wps
0.701 perplexity: 10080105677426370330298322327795790774272.000 speed: 13431 wps
0.801 perplexity: 115000120853598177122656739292984641585152.000 speed: 13453 wps
0.901 perplexity: 857297091568710692624085315481136757997568.000 speed: 13512 wps

carlthome avatar Mar 21 '18 09:03 carlthome

oops, my bad, I skim through the email and thought this is an issue in another repo. To answer your question, just let the model train for longer, it appears to diverge at first, but actually converge after a while.

hannw avatar Mar 21 '18 14:03 hannw