show-attend-and-tell
show-attend-and-tell copied to clipboard
Train Loss and Results
I trained attention model by using your code. But i found the results is very bad. How much about the training loss in your training step? Here is my results. It seems the attention model always looks at the same image area.
This is my training logs: Epoch [77/120]: [1999/3236], loss: 2.1223, perplexity: 8.3505. Epoch [77/120]: [2099/3236], loss: 2.2494, perplexity: 9.4824. Epoch [77/120]: [2199/3236], loss: 2.4643, perplexity: 11.7557. Epoch [77/120]: [2299/3236], loss: 2.2090, perplexity: 9.1063. Epoch [77/120]: [2399/3236], loss: 2.0539, perplexity: 7.7983. Epoch [77/120]: [2499/3236], loss: 2.1866, perplexity: 8.9053. Epoch [77/120]: [2599/3236], loss: 2.5096, perplexity: 12.3005. Epoch [77/120]: [2699/3236], loss: 2.2079, perplexity: 9.0968. Epoch [77/120]: [2799/3236], loss: 2.3442, perplexity: 10.4253. Epoch [77/120]: [2899/3236], loss: 2.4744, perplexity: 11.8745. Epoch [77/120]: [2999/3236], loss: 2.3289, perplexity: 10.2663. Epoch [77/120]: [3099/3236], loss: 2.8358, perplexity: 17.0436. Epoch [77/120]: [3199/3236], loss: 2.7584, perplexity: 15.7750. Epoch [78/120]: [99/3236], loss: 2.5769, perplexity: 13.1559. Epoch [78/120]: [199/3236], loss: 2.6376, perplexity: 13.9789. Epoch [78/120]: [299/3236], loss: 2.7185, perplexity: 15.1571. Epoch [78/120]: [399/3236], loss: 2.6247, perplexity: 13.8008. Epoch [78/120]: [499/3236], loss: 2.6202, perplexity: 13.7383. Epoch [78/120]: [599/3236], loss: 2.2878, perplexity: 9.8532. Epoch [78/120]: [699/3236], loss: 2.3625, perplexity: 10.6177. Epoch [78/120]: [799/3236], loss: 2.7058, perplexity: 14.9656. Epoch [78/120]: [899/3236], loss: 2.4509, perplexity: 11.5983. Epoch [78/120]: [999/3236], loss: 2.4193, perplexity: 11.2376. Epoch [78/120]: [1099/3236], loss: 2.5376, perplexity: 12.6494. Epoch [78/120]: [1199/3236], loss: 2.3549, perplexity: 10.5373. Epoch [78/120]: [1299/3236], loss: 2.5539, perplexity: 12.8575. Epoch [78/120]: [1399/3236], loss: 2.5338, perplexity: 12.6010. Epoch [78/120]: [1499/3236], loss: 2.5326, perplexity: 12.5857. Epoch [78/120]: [1599/3236], loss: 2.6627, perplexity: **14.3355. Can you help me to check this problems?
I think the training loss is a bit higher than expected. It should be at 1.5 or so. Have you checked the caption generated on training set? You might need to anneal the learning rate every a certain number of epochs (e.g. lr=lr*0.8 every 3 epochs). The training takes roughly 30~40 epochs.
Same issue here. It seems the model always attend to the same area even if I overfit the model on a very small dataset, e.g. 100 image-caption pairs.
I actually recorded the attention weights (or alpha
of the _attention_layer()
) regarding different time steps, which looks like this (look time steps horizontally):
the attention weights do change with time but it's too small to make a difference.
could you share some training details so that I can get the same attention results you got?