ai-toolkit Lumina2Transformer2DModel.forward() got an unexpected keyword argument 'use_mask_in

on runpod

probably related on this error https://github.com/huggingface/diffusers/pull/10776#discussion_r1953806298 hence I won't be using sample prompts for now

Generating baseline samples before training
Error running job: Lumina2Transformer2DModel.forward() got an unexpected keyword argument 'use_mask_in_transformer'                                         

========================================
Result:
 - 0 completed jobs
 - 1 failure
========================================
Traceback (most recent call last):
  File "/workspace/ai-toolkit/run.py", line 97, in <module>
    main()
  File "/workspace/ai-toolkit/run.py", line 93, in main
    raise e
  File "/workspace/ai-toolkit/run.py", line 85, in main
    job.run()
  File "/workspace/ai-toolkit/jobs/ExtensionJob.py", line 22, in run
    process.run()
  File "/workspace/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 1827, in run
    self.sample(self.step_num)
  File "/workspace/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 321, in sample
    self.sd.generate_images(gen_img_config_list, sampler=sample_config.sampler)
  File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/ai-toolkit/toolkit/stable_diffusion_model.py", line 1502, in generate_images
    img = pipeline(
          ^^^^^^^^^
  File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/diffusers/pipelines/lumina2/pipeline_lumina2.py", line 703, in __call__
    noise_pred_cond = self.transformer(
                      ^^^^^^^^^^^^^^^^^
  File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/ai-toolkit/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Lumina2Transformer2DModel.forward() got an unexpected keyword argument 'use_mask_in_transformer'

Feb 18 '25 04:02 arvnoodle

same here

Feb 18 '25 10:02 PsyChip

same , did you found solution?

Feb 18 '25 16:02 Solodev11

i jsut disable the sample prompts also the training isnt working as intended. No changes in the output in my comfy UI

Feb 19 '25 02:02 arvnoodle

!pip uninstall diffusers -y
!python -m pip cache purge
!pip install git+https://github.com/huggingface/diffusers.git

solved problem for me

Feb 19 '25 02:02 PsyChip

Same question.

Feb 28 '25 06:02 snowbedding

!pip uninstall diffusers -y
!python -m pip cache purge
!pip install git+https://github.com/huggingface/diffusers.git

solved problem for me

Tried this no luck. Which diffusers version are you using now?

pip show diffusers

Feb 28 '25 09:02 snowbedding

!pip uninstall diffusers -y
!python -m pip cache purge
!pip install git+https://github.com/huggingface/diffusers.git

solved problem for me

not working for me either, can you show your diffuser version?

Feb 28 '25 09:02 chenhh17

not working for me either, can you show your diffuser version?

0.33.0.dev0

Feb 28 '25 12:02 PsyChip

Hi everyone, if you are still looking at this problem, the hint will be to disable 'sample' from the training script. It was first stated by the poster of this issue, but I overlooked it.

Mar 14 '25 02:03 chenhh17

This is frankly bizarre to me. In transformer_lumina2.py, in the definition of Lumina2Transformer2DModel, the forward() function clearly has encoder_attention_mask in it:

def forward(
        self,
        hidden_states: torch.Tensor,
        timestep: torch.Tensor,
        encoder_hidden_states: torch.Tensor,
        encoder_attention_mask: torch.Tensor,
        attention_kwargs: Optional[Dict[str, Any]] = None,
        return_dict: bool = True,
    ) -> Union[torch.Tensor, Transformer2DModelOutput]:

Where could this error possibly be coming from?

Mar 16 '25 20:03 stepfunction83

Solved it! That was a weird issues with diffusers. The Lumina2TransformerBlock in transformer_lumina2.py has a different signature than the forward call does in the Lumina2Transformer2DModel call. It's possible to get it to run by adjusting pipeline_lumina2.py to be (starting at line 703):

noise_pred_cond = self.transformer(
                    hidden_states=latents,
                    timestep=current_timestep,
                    encoder_hidden_states=prompt_embeds,
                    
                    ### REPLACE encoder_attention_mask WITH attention_mask ###
                    attention_mask=prompt_attention_mask,
                    # encoder_attention_mask=prompt_attention_mask,

                    return_dict=False,

                    ### REMOVE attention_kwargs ###
                    # attention_kwargs=self.attention_kwargs,
                )[0]

                # perform normalization-based guidance scale on a truncated timestep interval
                if self.do_classifier_free_guidance and not do_classifier_free_truncation:
                    noise_pred_uncond = self.transformer(
                        hidden_states=latents,
                        timestep=current_timestep,
                        encoder_hidden_states=negative_prompt_embeds,
                        
                        ### REPLACE encoder_attention_mask WITH attention_mask ###
                        attention_mask=negative_prompt_attention_mask,
                        # encoder_attention_mask=negative_prompt_attention_mask,
                        return_dict=False,

                        ### REMOVE attention_kwargs ###
                        # attention_kwargs=self.attention_kwargs,
                    )[0]

After that was updated, the samples generated correctly (well, as correctly as can be expected as far as hands go...)

Mar 16 '25 21:03 stepfunction83