parler-tts icon indicating copy to clipboard operation
parler-tts copied to clipboard

[Modelling] ROPE and Prompt Cross-Attention

Open sanchit-gandhi opened this issue 1 month ago • 0 comments

ROPE:

  • Applied to the q/k/v states in the self-attention
  • Applied to the q states only in the cross-attention (not the k/v states)
  • The rationale is that the k/v states come from the encoder, which has T5 positional embeddings already applied

Cross-Attention:

  • Option to concatenate the T5 encoder hidden-states and prompt embeddings to be used as cross-attention conditioning
  • If we do this, we no longer have to concatenate the prompt embeddings to the input embeddings
  • We also apply a positional embedding to the prompt embeddings to encode positional info

sanchit-gandhi avatar May 17 '24 15:05 sanchit-gandhi