image_captioning
image_captioning copied to clipboard
Problems with SCA-CNN
Hi,
This is a very excellent work for me and I have read the core code of SCA-CNN model! Thank you a lot for providing the code in Pytorch.
I have a question with the code following:
when I train a model with 20 seq_max_len
for example, this for
will do 20 times, and are they share the same weights and feature maps? or if they share same weights and feature maps, they will have the same output? can you give me some advice on this? or the h
and c
of LSTM
will change between for
which will make output different?
Thank you very much!
for t in range(seq_max_len):
if self.att_mode == 'cs':
beta = self.channel_attention(features, hidden)
features = beta * features
alpha = self.spatial_attention(features, hidden)
feats = alpha * features
elif self.att_mode == 'c':
beta = self.channel_attention(features, hidden)
feats = beta * features
elif self.att_mode == 's':
alpha = self.spatial_attention(features, hidden)
feats = alpha * features
else:
alpha = self.spatial_attention(features, hidden)
features = alpha * features
beta = self.channel_attention(features, hidden)
feats = beta * features
feats = feats.view(1, batch_size, -1)
embed = embeddings[t]
inputs = torch.cat([embed, feats], dim=2)
hidden, states = self.lstm(inputs, states)
hidden = self.dropout(hidden)
output = self.lstm_output(hidden)
logits.append(output)
Yes they share the same LSTM so the weights are also shared. But as you have mentioned, h and c of lstm will change as the input changes, thus the output for each t will be different.
Hi,
I have a similar problem. If I add attention after every convolution layer, as the original paper says. Then, is the attention parameter of each layer shared?
@N-Kingsley Hi, 1) I think the parameters of each layer in attention layers are the same, but the data you pass to attention layers is different, such as features
and hidden
, they are changed as forward, so the alpha
and beta
are changed. Or 2) you can create different attention layers for different convolution layers, so they won't share same parameter.
If the number of channels per layer is different, can parameters shared?
@N-Kingsley Hi, I know what you mean, the code in this repo only adds one attention layer after decoder CNN, and he uses CNN and LSTM and for
to predict words, in the for
, the parameters of attention layer are shared, so the parameters are the same. But when you add more attention layers in your code, it will be different, and don't share.