neuraltalk icon indicating copy to clipboard operation
neuraltalk copied to clipboard

Aborting, cost seems to be exploding.

Open pannous opened this issue 9 years ago • 4 comments

training with flickr8k aborts:

253/15000 batch done in 5.037s. at epoch 0.84. loss cost = 37.447347, reg cost = 0.000001, ppl2 = 26.10 (smooth 48.09) 254/15000 batch done in 5.082s. at epoch 0.85. loss cost = 39.408169, reg cost = 0.000001, ppl2 = 29.19 (smooth 47.91) 255/15000 batch done in 4.914s. at epoch 0.85. loss cost = 140.730310, reg cost = 0.000001, ppl2 = 237360.65 (smooth 2421.03) Aboring, cost seems to be exploding. Run gradcheck? Lower the learning rate?

pannous avatar Jan 08 '15 13:01 pannous

With default parameters? I Thoguht I tuned them so that this doesn't happen, sorry about that. As the message suggests, lowering the learning rate does it. Set learning_rate to be about half or fifth of what it is now, until it doesn't explode :)

karpathy avatar Jan 08 '15 17:01 karpathy

Here is my result on the default setting:

python driver.py parsed parameters: { "grad_clip": 5, "rnn_relu_encoders": 0, "dataset": "flickr8k", "image_encoding_size": 256, "eval_max_images": -1, "drop_prob_decoder": 0.5, "word_encoding_size": 256, "max_epochs": 50, "eval_batch_size": 100, "fappend": "baseline", "generator": "lstm", "write_checkpoint_ppl_threshold": -1, "decay_rate": 0.999, "tanhC_version": 0, "hidden_size": 256, "momentum": 0.0, "worker_status_output_directory": "status/", "learning_rate": 0.001, "checkpoint_output_directory": "cv/", "do_grad_check": 0, "word_count_threshold": 5, "batch_size": 100, "regc": 1e-08, "smooth_eps": 1e-08, "solver": "rmsprop", "eval_period": 1.0, "drop_prob_encoder": 0.5 } Initializing data provider for dataset flickr8k... BasicDataProvider: reading data/flickr8k/dataset.json BasicDataProvider: reading data/flickr8k/vgg_feats.mat preprocessing word counts and creating vocab based on word count threshold 5

253/15000 batch done in 3.242s. at epoch 0.84. loss cost = 39.264201, reg cost = 0.000001, ppl2 = 29.60 (smooth 47.89) 254/15000 batch done in 3.133s. at epoch 0.85. loss cost = 39.633654, reg cost = 0.000001, ppl2 = 33.57 (smooth 47.74) 255/15000 batch done in 3.169s. at epoch 0.85. loss cost = 38.571550, reg cost = 0.000001, ppl2 = 29.56 (smooth 47.56)

. . . . . one day later... .

14999/15000 batch done in 3.492s. at epoch 50.00. loss cost = 28.621228, reg cost = 0.000004, ppl2 = 11.19 (smooth 10.80) evaluating val performance in batches of 100 evaluated 5000 sentences and got perplexity = 17.785250 validation perplexity = 17.785250

StevenLOL avatar Jan 10 '15 09:01 StevenLOL

@StevenLOL Nice! Looking at the Model Zoo, http://cs.stanford.edu/people/karpathy/neuraltalk/

my LSTM model achieves perplexity of about 15.7 (which is slightly better). I ran it for longer and cross-validated it on our cluster, though.

karpathy avatar Jan 10 '15 10:01 karpathy

Thanks I will try again with reduced learning rate

On Jan 10, 2015, at 10:59 AM, Steven [email protected] wrote:

Here is my result on the default setting:

python driver.py parsed parameters: { "grad_clip": 5, "rnn_relu_encoders": 0, "dataset": "flickr8k", "image_encoding_size": 256, "eval_max_images": -1, "drop_prob_decoder": 0.5, "word_encoding_size": 256, "max_epochs": 50, "eval_batch_size": 100, "fappend": "baseline", "generator": "lstm", "write_checkpoint_ppl_threshold": -1, "decay_rate": 0.999, "tanhC_version": 0, "hidden_size": 256, "momentum": 0.0, "worker_status_output_directory": "status/", "learning_rate": 0.001, "checkpoint_output_directory": "cv/", "do_grad_check": 0, "word_count_threshold": 5, "batch_size": 100, "regc": 1e-08, "smooth_eps": 1e-08, "solver": "rmsprop", "eval_period": 1.0, "drop_prob_encoder": 0.5 } Initializing data provider for dataset flickr8k... BasicDataProvider: reading data/flickr8k/dataset.json BasicDataProvider: reading data/flickr8k/vgg_feats.mat preprocessing word counts and creating vocab based on word count threshold 5

253/15000 batch done in 3.242s. at epoch 0.84. loss cost = 39.264201, reg cost = 0.000001, ppl2 = 29.60 (smooth 47.89) 254/15000 batch done in 3.133s. at epoch 0.85. loss cost = 39.633654, reg cost = 0.000001, ppl2 = 33.57 (smooth 47.74) 255/15000 batch done in 3.169s. at epoch 0.85. loss cost = 38.571550, reg cost = 0.000001, ppl2 = 29.56 (smooth 47.56)

. . . . . one day later... .

14999/15000 batch done in 3.492s. at epoch 50.00. loss cost = 28.621228, reg cost = 0.000004, ppl2 = 11.19 (smooth 10.80) evaluating val performance in batches of 100 evaluated 5000 sentences and got perplexity = 17.785250 validation perplexity = 17.785250

— Reply to this email directly or view it on GitHub.

pannous avatar Jan 10 '15 13:01 pannous