OneTrainer icon indicating copy to clipboard operation
OneTrainer copied to clipboard

[Bug]: CUDA oom error since the new update

Open joneschunghk opened this issue 6 months ago • 0 comments

What happened?

Traceback (most recent call last):
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\ui\TrainUI.py", line 544, in __training_thread_function
    trainer.train()
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\trainer\GenericTrainer.py", line 514, in train
    self.data_loader.get_data_set().start_next_epoch()
  File "d:\ai\stable_diffusion\onetrainer\venv\src\mgds\src\mgds\MGDS.py", line 49, in start_next_epoch
    self.loading_pipeline.start_next_epoch()
  File "d:\ai\stable_diffusion\onetrainer\venv\src\mgds\src\mgds\LoadingPipeline.py", line 97, in start_next_epoch
    module.start(self.__current_epoch)
  File "d:\ai\stable_diffusion\onetrainer\venv\src\mgds\src\mgds\pipelineModules\DiskCache.py", line 239, in start
    self.__refresh_cache(out_variation)
  File "d:\ai\stable_diffusion\onetrainer\venv\src\mgds\src\mgds\pipelineModules\DiskCache.py", line 181, in __refresh_cache
    self.before_cache_fun()
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\dataLoader\StableDiffusion3BaseDataLoader.py", line 335, in before_cache_text_fun
    model.text_encoder_3_to(self.train_device)
  File "D:\AI\Stable_Diffusion\OneTrainer\modules\model\StableDiffusion3Model.py", line 171, in text_encoder_3_to
    self.text_encoder_3.to(device=device)
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\lib\site-packages\transformers\modeling_utils.py", line 2796, in to
    return super().to(*args, **kwargs)
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1173, in to
    return self._apply(convert)
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 779, in _apply
    module._apply(fn)
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 779, in _apply
    module._apply(fn)
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 779, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 804, in _apply
    param_applied = fn(param)
  File "D:\AI\Stable_Diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1159, in convert
    return t.to(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU

What did you expect would happen?

How can I revert to a previous branch?

Relevant log output

No response

Output of pip freeze

No response

joneschunghk avatar Aug 06 '24 04:08 joneschunghk