XProNet Are cross-modal feature and cross-model representation vector same?

In your parper you write:"we concatenate the visual and textual representations to form the cross-modal features $$r\in \mathbb{R} ^{1\times D}$$", but the formular below writes:" $$o_u=Concate(o_u^{i(f)},o_u^t)$$", Are they the same vector? and in this formular: $$PM(k,i)=\frac{1}{N_{k,i}^s}\sum_{j=0}^N r_j^{k,i}$$ what's the meaning of $$N_{k,i}^s$$ ? I didn't find these details in the source code. It is my understand that you first extract visual and textual representation and concate them to form the cross-modal feature $$r_u=Concat(o_u^{i(f)},o^t_u)$$, and grouped them into $$N_l$$ sets{ $$R_k;0 \le k \le N_l$$ } according to the sample label, then applying K-Means on each $$R_k$$ which split $$R_k$$ into $$N^p$$ cluster. Finally, take the average of the vectors within the cluster as the prototype vector $$PM(k,i)$$ . Is this understanding correct?

Nov 13 '24 12:11 DanyangCheng

In your parper you write:"we concatenate the visual and textual representations to form the cross-modal features r∈R1×D", but the formular below writes:" ou=Concate(oui(f),out)", Are they the same vector? and in this formular: PM(k,i)=1Nk,is∑j=0Nrjk,i what's the meaning of Nk,is ? I didn't find these details in the source code. It is my understand that you first extract visual and textual representation and concate them to form the cross-modal feature ru=Concat(oui(f),out), and grouped them into Nl sets{ Rk;0≤k≤Nl } according to the sample label, then applying K-Means on each Rk which split Rk into Np cluster. Finally, take the average of the vectors within the cluster as the prototype vector PM(k,i) . Is this understanding correct?

Hi, thank you for your interest to our work. o and r are both the cross-modal features. We use two chracters to refer the cross-modal features as o_u is associate with specific sample u, while r is used to index the cross-modal feature after clustering.

$N^s_{k,i}$ , sorry this is a typo here, it should be $N^d_{k,i}$.

You are right, the procedure of the prototype initialization is the same as you summarize.

Hope this information could help you figure out the problem.

Best Regards, Jun

Nov 15 '24 14:11 Markin-Wang

Your reply helped me a lot, and your work is great.

Nov 17 '24 09:11 DanyangCheng