image_captioning icon indicating copy to clipboard operation
image_captioning copied to clipboard

Problems with SCA-CNN

Open BrunoQin opened this issue 5 years ago • 5 comments

Hi, This is a very excellent work for me and I have read the core code of SCA-CNN model! Thank you a lot for providing the code in Pytorch. I have a question with the code following: when I train a model with 20 seq_max_len for example, this for will do 20 times, and are they share the same weights and feature maps? or if they share same weights and feature maps, they will have the same output? can you give me some advice on this? or the h and c of LSTM will change between for which will make output different? Thank you very much!

for t in range(seq_max_len):
            if self.att_mode == 'cs':
                beta = self.channel_attention(features, hidden)
                features = beta * features
                alpha = self.spatial_attention(features, hidden)
                feats = alpha * features
            elif self.att_mode == 'c':
                beta = self.channel_attention(features, hidden)
                feats = beta * features
            elif self.att_mode == 's':
                alpha = self.spatial_attention(features, hidden)
                feats = alpha * features
            else:
                alpha = self.spatial_attention(features, hidden)
                features = alpha * features
                beta = self.channel_attention(features, hidden)
                feats = beta * features
            feats = feats.view(1, batch_size, -1)
            embed = embeddings[t]
            inputs = torch.cat([embed, feats], dim=2)
            hidden, states = self.lstm(inputs, states)
            hidden = self.dropout(hidden)
            output = self.lstm_output(hidden)
            logits.append(output)

BrunoQin avatar Mar 04 '19 13:03 BrunoQin

Yes they share the same LSTM so the weights are also shared. But as you have mentioned, h and c of lstm will change as the input changes, thus the output for each t will be different.

stevehuanghe avatar Mar 07 '19 04:03 stevehuanghe

Hi,

I have a similar problem. If I add attention after every convolution layer, as the original paper says. Then, is the attention parameter of each layer shared?

N-Kingsley avatar Mar 19 '19 02:03 N-Kingsley

@N-Kingsley Hi, 1) I think the parameters of each layer in attention layers are the same, but the data you pass to attention layers is different, such as features and hidden, they are changed as forward, so the alpha and beta are changed. Or 2) you can create different attention layers for different convolution layers, so they won't share same parameter.

BrunoQin avatar Mar 19 '19 02:03 BrunoQin

If the number of channels per layer is different, can parameters shared?

N-Kingsley avatar Mar 19 '19 03:03 N-Kingsley

@N-Kingsley Hi, I know what you mean, the code in this repo only adds one attention layer after decoder CNN, and he uses CNN and LSTM and for to predict words, in the for, the parameters of attention layer are shared, so the parameters are the same. But when you add more attention layers in your code, it will be different, and don't share.

BrunoQin avatar Mar 19 '19 07:03 BrunoQin