pytorch-stable-diffusion
pytorch-stable-diffusion copied to clipboard
why do we need set causal_mask =True in clip
prompt is a sentence ,we don't need to predict next token in prompt, is there a question to see the right tokens? x = self.attention(x, causal_mask=True)
This my question as well.
I had the same query. This is the answer I found in the CLIP paper by OpenAI:
"Masked self-attention was used in the text encoder to preserve the ability to initialize with a pre-trained language model or add language modeling as an auxiliary objective, though exploration of this is left as future work."
Paper: https://arxiv.org/pdf/2103.00020