hi, I have attention_mask problem mismatch in the cross attenstion

can you please explain this line: requires_attention_mask = "encoder_outputs" not in model_kwargs ?

why is comed after this: if "encoder_outputs" not in model_kwargs: # encoder_outputs are created and added to model_kwargs model_kwargs = self._prepare_text_encoder_kwargs_for_generation( inputs_tensor, model_kwargs, model_input_name, generation_config, )

is the attention mask is needed for the cross attnetion layer in the generation part? this mismach problem accure only in the generator the train & eval are ok.

tnx!

May 30 '24 11:05 netagl

I think it might se related to this:

        encoder_attention_mask = _prepare_4d_attention_mask(
            encoder_attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]
        )

In the comment written:

[bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]

but _prepare_4d_attention_mask returns src_seq_len as 1. Is this is what you ment? because it is not working well with the cross attention condition:

    if attention_mask is not None:
        if attention_mask.size() != (bsz, 1, tgt_len, src_len):
            raise ValueError(
                f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {attention_mask.size()}"
            )

i would like some help @ylacombe

May 30 '24 16:05 netagl

Hey @netagl, thanks for your message! not sure to understand your issue, could you send a code snippet to reproduce any potential issues ?

The attention mask is needed in the cross attention layer if you have a batch of samples, otherwise you don't need to pass it to the model!

Jun 07 '24 08:06 ylacombe

@netagl, Is your audio_encoder_per_device_batch_size 1?

Jun 20 '24 00:06 kdcyberdude

parler-tts parler-tts copied to clipboard

attention_mask

[bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]

parler-tts
parler-tts copied to clipboard