cvpr2018-SSAH icon indicating copy to clipboard operation
cvpr2018-SSAH copied to clipboard

How much time it takes in the training phase?

Open jingang-cv opened this issue 6 years ago • 16 comments

I run the code for a whole day on a K80 GPU. But now it only runs to epoch 19. So I want to ask the time cost in your training phase

jingang-cv avatar Oct 21 '18 20:10 jingang-cv

@asiandragon my gpu is 1080Ti, training is quick. If you use the K80, it will relatively slow. Besides, I advise you try different training order. or lastly, reload the per-trained weights, and discard the discriminator,only train the Generator. Good Luck

zyfsa avatar Oct 22 '18 12:10 zyfsa

@zyfsa Thanks for your reply. How about the meaning of different training order? Is it mean to change the order of updating ImgNet, LabNet and TxtNet? Also, how about the rough time you cost? Can you finish the training within 1 day?

jingang-cv avatar Oct 22 '18 13:10 jingang-cv

@asiandragon the first question, you are right. the second problem, too many epochs are useless. i run 60 epochs,then stop it, the time is about 12 hours.you can try it.

zyfsa avatar Oct 22 '18 14:10 zyfsa

@zyfsa I stop the training after 50 epochs, but the accuracy is only around 60%,55% for i-t and t-i, respectively. Could you kindly upload your pretrained model in the dir checkpoint? Thank you.

jingang-cv avatar Oct 22 '18 21:10 jingang-cv

@asiandragon I think you are right. So I reload the pre-trained weights in checkpoints, only train the Generator. This brings some improvement. Besides, Training has a certain randomness. you can try it

zyfsa avatar Oct 23 '18 01:10 zyfsa

@asiandragon namely, I do not train the discriminator, only train the lab_net, img_net, text_net

zyfsa avatar Oct 23 '18 01:10 zyfsa

@zyfsa Have you ever conducted this experiment on the NUS-WIDE dataset?

FrankYufeng17 avatar Oct 24 '18 01:10 FrankYufeng17

@FrankYufeng17 No,,maybe this is the next wrok

zyfsa avatar Oct 24 '18 02:10 zyfsa

@zyfsa So it means that we can neglect the loss in Sec3.4 adversarial learning of the original paper? And according to your updated results, in this case, we can obtain similar results with the original paper by one epoch?

jingang-cv avatar Oct 25 '18 05:10 jingang-cv

@asiandragon yes,you can directly run it.You can get the result

zyfsa avatar Oct 26 '18 09:10 zyfsa

@zyfsa I get the same result with you. But have you find out why the results decrease after adding the adversarial learning part? I fail to get this solution.

jingang-cv avatar Nov 04 '18 21:11 jingang-cv

sorry, I do not try this experiment. In fact, I think the adversarial learning is a magic and some paper also think it only can bring little or no improvements. Besides, in this paper(SSAH), the self-supervised semantic network is very important.

zyfsa avatar Nov 05 '18 01:11 zyfsa

@zyfsa Also, though I can obtain a relatively high performance with only 1 epoch, when I continue to train the DL network, the performance will be decreased (only i-t 64% t-i 55% after 11 epochs). Do you encounter this problem?

jingang-cv avatar Nov 05 '18 09:11 jingang-cv

yes, I encounter this problem. Besides, I consulted with the authors, it also get the Similar conclusion. The adversarial learning is not as important as we think

zyfsa avatar Nov 05 '18 09:11 zyfsa

@zyfsa how do you solve this problem? Only train 1 epoch and utilize the results? I exclude the adversarial learning part but the results also decrease along with the training phase (when iterating more epochs).

jingang-cv avatar Nov 05 '18 16:11 jingang-cv

@asiandragon I am also confused because training is too fast. The authors show the training efficiency in this paper in figure 5. But he escapes this problem we discussed. Perhaps this result demonstrates the superiority of the approach and early stopping is a good method to prevent overfitting? Maybe You can visual the loss by tensorboard.

zyfsa avatar Nov 06 '18 02:11 zyfsa