parler-tts
parler-tts copied to clipboard

Published 20 hours ago •

Reame
Issues

[Modelling] ROPE and Prompt Cross-Attention

Open sanchit-gandhi opened this issue 1 month ago • 0 comments

ROPE:

Applied to the q/k/v states in the self-attention
Applied to the q states only in the cross-attention (not the k/v states)
The rationale is that the k/v states come from the encoder, which has T5 positional embeddings already applied

Cross-Attention:

Option to concatenate the T5 encoder hidden-states and prompt embeddings to be used as cross-attention conditioning
If we do this, we no longer have to concatenate the prompt embeddings to the input embeddings
We also apply a positional embedding to the prompt embeddings to encode positional info

May 17 '24 15:05 sanchit-gandhi