IDM-VTON icon indicating copy to clipboard operation
IDM-VTON copied to clipboard

Training Code: dtype Error During Model Forward Pass

Open DAVEISHAN opened this issue 1 year ago • 6 comments

I am facing a RuntimeError related to dtype mismatches during the forward pass of training code. down, reference_features = unet_encoder(cloth_values, timesteps, text_embeds_cloth,return_dict=False) File "/miniforge/envs/idm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/mnt/task_runtime/src/unet_hacked_garmnet.py", line 1052, in forward emb = self.time_embedding(t_emb, timestep_cond) File "/miniforge/envs/idm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/miniforge/envs/idm/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 228, in forward sample = self.linear_1(sample) File "/miniforge/envs/idm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/miniforge/envs/idm/lib/python3.10/site-packages/diffusers/models/lora.py", line 430, in forward out = super().forward(hidden_states) File "/miniforge/envs/idm/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 must have the same dtype

I've found that converting vae to regular torch.float32 resolves this issue. Like the following: vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path,subfolder="vae") instead of your code: vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path,subfolder="vae",torch_dtype=torch.float16,)

However, before standardizing this change across the codebase, I would like to confirm if it is advisable to do so.

Thank you!

DAVEISHAN avatar Jul 30 '24 05:07 DAVEISHAN

@DAVEISHAN try setting mixed precesion in train_xl.sh file, it solved the issue fr me.

Alexsumt avatar Jul 30 '24 09:07 Alexsumt

Thank you @Alexsumt, however, I am getting random nan as the step_losses, not sure what's the issue. Any leads?

DAVEISHAN avatar Jul 31 '24 07:07 DAVEISHAN

@Alexsumt which type mixed precession did you set?

aaaqianqian avatar Aug 05 '24 16:08 aaaqianqian

@aaaqianqian BF16.

Alexsumt avatar Aug 15 '24 09:08 Alexsumt

BF16

On Mon, Aug 5, 2024 at 9:49 PM aaaqianqian @.***> wrote:

@Alexsumt https://github.com/Alexsumt which type mixed precession did you set?

— Reply to this email directly, view it on GitHub https://github.com/yisol/IDM-VTON/issues/111#issuecomment-2269446753, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXWJO6IG4X5PG2WKA3VNT3ZP6Q25AVCNFSM6AAAAABLVTZU7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGQ2DMNZVGM . You are receiving this because you were mentioned.Message ID: @.***>

Alexsumt avatar Sep 06 '24 16:09 Alexsumt

BF16

Hey @Alexsumt ,after settnig BF16 did you get good result comparing to the official model?

NH1900 avatar Mar 30 '25 21:03 NH1900