NeuralBabyTalk
NeuralBabyTalk copied to clipboard
paper eq issue
In your paper,I think vt should be vi in eq.4, is it right?
I have a similar question,
According to this paper, is the region feature of
,
and
is a latent variable that denotes a specific image region.
This means that only refers to one specific image region, but
means there are several regions at time t. And there are m regions I think, according to eq.13.
The dimension of eq.4 also has some issue I think.
Dimension of is m x d * d x 1 = m x 1
but dimension of
is d x d * d x 1 = d x 1
these 2 dimensions are not matched and can't be added toghter.
In your paper,I think vt should be vi in eq.4, is it right?
The same question with you, and I agree with your idea.
I have the same issue and it confused me this afternoon. Is there someone who can fix eq(4)?Thanks in advance.