video-diffusion-pytorch
video-diffusion-pytorch copied to clipboard
Reason for combining rotary and relative positional embedding?
Hi,
Awesome work first of all. Is there a reason why you would combine both rotational as well as relative positional embedding in your Attention class? I would assume one of both is enough to incorporate the positions of the frames to the attention model?
same doubt here