self-critical.pytorch icon indicating copy to clipboard operation
self-critical.pytorch copied to clipboard

Embedding followed by ReLU

Open luo3300612 opened this issue 4 years ago • 6 comments

Thank you for your repo. I have a question about the way of word embedding in these captioning models. Why word embedding layer is followed by a ReLU layer? Since nn.Embedding is intialized from N(0,1). ReLU will make 50% of embedding parameters useless.

luo3300612 avatar Mar 03 '20 02:03 luo3300612

You are right. I have never carefully think of it. But I can tell you where did I get it. https://github.com/jiasenlu/AdaptiveAttention/blob/master/misc/LanguageModel.lua#L40

They have two technical differences, which is this relu embedding, another is two layer image embedding. I adopt it because they work work well. But to be honest I have never tried each of them separately.

ruotianluo avatar Mar 03 '20 03:03 ruotianluo

I can probably try later this week. but if you want to try, go for it.

ruotianluo avatar Mar 03 '20 03:03 ruotianluo

Thank you for your prompt reply. And I have another question. Could you please tell me what change have you made to your transformer model. Because last year I have run the majority of these models and found that the performance of transformer was as well as Bottom-up Top Down Model( About 1.21 CIDEr on karparthy test split). To my suprise, I found CIDEr can reach 1.278(not selected/ beam size 2 / with self critical) when last week I run your transformer model...

luo3300612 avatar Mar 07 '20 05:03 luo3300612

Eh. I don't think I changed anything.... It should be the same model.

ruotianluo avatar Mar 07 '20 13:03 ruotianluo

@luo3300612 did you get 1.21 cider in transformer model for XE training? If yes, may I know your settings for that? I could only get 1.157 for XE with beam size 3 for the transformer model.

fawazsammani avatar Mar 09 '20 10:03 fawazsammani

@luo3300612 did you get 1.21 cider in transformer model for XE training? If yes, may I know your settings for that? I could only get 1.157 for XE with beam size 3 for the transformer model.

1.157 for XE is right for transformer. I get 1.21 after self-critical.

luo3300612 avatar Mar 10 '20 14:03 luo3300612