OneTrainer
OneTrainer copied to clipboard
[Bug]: Pixart Sigma training bug if save in diffusers
What happened?
Found a strange bug when finetune pixart sigma.
If saving to diffusers is selected instead of safetensor, when an epoch is completed and saved, it cannot start a new epoch and crashes with an error. Also then it can't continue training from diffusers backup (same error).
RuntimeError: Input type (CUDABFloat16Type) and weight type (CPUBFloat16Type) should be the same Creating Backup workspace/run\backup\2024-06-20_16-20-05-backup-638-1-0 Saving models/model
What did you expect would happen?
Training continue successfuly
Relevant log output
sampling: 100%|████████████████████████████████████████████████████████████████████████| 20/20 [00:12<00:00, 1.54it/s]
step: 100%|███████████████████████████████████████| 638/638 [3:28:05<00:00, 19.57s/it, loss=0.0934, smooth loss=0.0804]
epoch: 25%|█████████████████▎ | 1/4 [3:28:14<10:24:44, 12494.69s/it]Saving workspace/run\save\BigLora2024-06-20_16-17-45-save-638-1-0 | 0/638 [00:00<?, ?it/s]
step: 0%| | 0/638 [02:19<?, ?it/s]
epoch: 25%|█████████████████▎ | 1/4 [3:30:34<10:31:42, 12634.30s/it]
Traceback (most recent call last):
File "D:\_____NEW_NN\OneTrainer\modules\ui\TrainUI.py", line 522, in __training_thread_function
trainer.train()
File "D:\_____NEW_NN\OneTrainer\modules\trainer\GenericTrainer.py", line 572, in train
model_output_data = self.model_setup.predict(self.model, batch, self.config, train_progress)
File "D:\_____NEW_NN\OneTrainer\modules\modelSetup\BasePixArtAlphaSetup.py", line 389, in predict
predicted_latent_noise, predicted_latent_var_values = model.transformer(
File "D:\_____NEW_NN\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\_____NEW_NN\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "D:\_____NEW_NN\OneTrainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_2d.py", line 418, in forward
hidden_states, encoder_hidden_states, timestep, embedded_timestep = self._operate_on_patched_inputs(
File "D:\_____NEW_NN\OneTrainer\venv\src\diffusers\src\diffusers\models\transformers\transformer_2d.py", line 502, in _operate_on_patched_inputs
hidden_states = self.pos_embed(hidden_states)
File "D:\_____NEW_NN\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\_____NEW_NN\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "D:\_____NEW_NN\OneTrainer\venv\src\diffusers\src\diffusers\models\embeddings.py", line 167, in forward
latent = self.proj(latent)
File "D:\_____NEW_NN\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\_____NEW_NN\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "D:\_____NEW_NN\OneTrainer\venv\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "D:\_____NEW_NN\OneTrainer\venv\lib\site-packages\torch\nn\modules\conv.py", line 453, in _conv_forward
return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
RuntimeError: Input type (CUDABFloat16Type) and weight type (CPUBFloat16Type) should be the same
Creating Backup workspace/run\backup\2024-06-20_16-20-05-backup-638-1-0
Saving models/model
Output of pip freeze
No response