stable-diffusion Math Typo in Paper?? (dimension of linear layers in Q, K, V)

Math Typo in Paper?? (dimension of linear layers in Q, K, V)

Open RoyVoetman opened this issue 1 year ago • 0 comments

I am reading the latent diffusion paper, which states for cross-attention we have a linear projection for Q K V with W_q W_k W_v.

But it then says W_v \in R^{d x d_c^i} and both W_q and W_k \in R^{d x d_r}. But shouldn't W_q be \in R^{d x d_c^i} and W_v \in R^{d x d_r}? Since K and V need to have the same dimension? By looking at the code here on github W_v and W_k are also of the same dim, but as I said not in the paper? Is this a known typo?

Paper:

Code:

https://github.com/CompVis/stable-diffusion/blob/21f890f9da3cfbeaba8e2ac3c425ee9e998d5229/ldm/modules/attention.py#L152

Mar 08 '23 08:03 RoyVoetman

stable-diffusion stable-diffusion copied to clipboard

Math Typo in Paper?? (dimension of linear layers in Q, K, V)

stable-diffusion
stable-diffusion copied to clipboard