CogVideo
CogVideo copied to clipboard
Consideration about 3D-RoPE
In your completement,you independently apply 1D-RoPE to each dimension of the coordinates, each occupying 3/8, 3/8, and 2/8 of the hidden states’ channel, and then concatenate them at the last dim. However, in the inner product of matrix Q and matrix K, the 2/8 time dimension will interactive with 3/8 width and 3/8 height dimensions, the 3/8 width dimension will interactive with 2/8 time and 3/8 height dimensions, the 3/8 height dimension will interactive with 3/8 width and 2/8 time dimension, these interactions do not conform to the operation formula of RoPE, do you think your 3D-RoPE's completement is wrong?