show-attend-and-tell icon indicating copy to clipboard operation
show-attend-and-tell copied to clipboard

Train Loss and Results

Open lcuyh opened this issue 6 years ago • 2 comments

I trained attention model by using your code. But i found the results is very bad. How much about the training loss in your training step? Here is my results. It seems the attention model always looks at the same image area. figure_1 figure_2

This is my training logs: Epoch [77/120]: [1999/3236], loss: 2.1223, perplexity: 8.3505. Epoch [77/120]: [2099/3236], loss: 2.2494, perplexity: 9.4824. Epoch [77/120]: [2199/3236], loss: 2.4643, perplexity: 11.7557. Epoch [77/120]: [2299/3236], loss: 2.2090, perplexity: 9.1063. Epoch [77/120]: [2399/3236], loss: 2.0539, perplexity: 7.7983. Epoch [77/120]: [2499/3236], loss: 2.1866, perplexity: 8.9053. Epoch [77/120]: [2599/3236], loss: 2.5096, perplexity: 12.3005. Epoch [77/120]: [2699/3236], loss: 2.2079, perplexity: 9.0968. Epoch [77/120]: [2799/3236], loss: 2.3442, perplexity: 10.4253. Epoch [77/120]: [2899/3236], loss: 2.4744, perplexity: 11.8745. Epoch [77/120]: [2999/3236], loss: 2.3289, perplexity: 10.2663. Epoch [77/120]: [3099/3236], loss: 2.8358, perplexity: 17.0436. Epoch [77/120]: [3199/3236], loss: 2.7584, perplexity: 15.7750. Epoch [78/120]: [99/3236], loss: 2.5769, perplexity: 13.1559. Epoch [78/120]: [199/3236], loss: 2.6376, perplexity: 13.9789. Epoch [78/120]: [299/3236], loss: 2.7185, perplexity: 15.1571. Epoch [78/120]: [399/3236], loss: 2.6247, perplexity: 13.8008. Epoch [78/120]: [499/3236], loss: 2.6202, perplexity: 13.7383. Epoch [78/120]: [599/3236], loss: 2.2878, perplexity: 9.8532. Epoch [78/120]: [699/3236], loss: 2.3625, perplexity: 10.6177. Epoch [78/120]: [799/3236], loss: 2.7058, perplexity: 14.9656. Epoch [78/120]: [899/3236], loss: 2.4509, perplexity: 11.5983. Epoch [78/120]: [999/3236], loss: 2.4193, perplexity: 11.2376. Epoch [78/120]: [1099/3236], loss: 2.5376, perplexity: 12.6494. Epoch [78/120]: [1199/3236], loss: 2.3549, perplexity: 10.5373. Epoch [78/120]: [1299/3236], loss: 2.5539, perplexity: 12.8575. Epoch [78/120]: [1399/3236], loss: 2.5338, perplexity: 12.6010. Epoch [78/120]: [1499/3236], loss: 2.5326, perplexity: 12.5857. Epoch [78/120]: [1599/3236], loss: 2.6627, perplexity: **14.3355. Can you help me to check this problems?

lcuyh avatar May 23 '18 05:05 lcuyh

I think the training loss is a bit higher than expected. It should be at 1.5 or so. Have you checked the caption generated on training set? You might need to anneal the learning rate every a certain number of epochs (e.g. lr=lr*0.8 every 3 epochs). The training takes roughly 30~40 epochs.

alecwangcq avatar May 24 '18 04:05 alecwangcq

Same issue here. It seems the model always attend to the same area even if I overfit the model on a very small dataset, e.g. 100 image-caption pairs.

I actually recorded the attention weights (or alpha of the _attention_layer()) regarding different time steps, which looks like this (look time steps horizontally): screenshot from 2018-05-24 14-03-21 the attention weights do change with time but it's too small to make a difference.

could you share some training details so that I can get the same attention results you got?

daveredrum avatar May 24 '18 12:05 daveredrum