ai-toolkit
ai-toolkit copied to clipboard
Does it also support cloud?
Hello, I attempted training on Runpod with an A6000, but I'm encountering a CUDA error. I've seen this issue occasionally with other trainers before. Does this mean training in the cloud might be impossible? Or could it be that something is incorrectly configured?
Error running job: CUDA out of memory. Tried to allocate 48.00 MiB. GPU 0 has a total capacity of 47.53 GiB of which 43.94 MiB is free. Process 654925 has 5.68 GiB memory in use. Process 706098 has 7.62 GiB memory in use. Including non-PyTorch memory, this process has 34.17 GiB memory in use. Of the allocated memory 33.51 GiB is allocated by PyTorch, and 351.95 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
======================================== Result:
- 0 completed jobs
- 1 failure
========================================
Traceback (most recent call last):
File "/root/MJ/ai-toolkit/run.py", line 90, in
main() File "/root/MJ/ai-toolkit/run.py", line 86, in main raise e File "/root/MJ/ai-toolkit/run.py", line 78, in main job.run() File "/root/MJ/ai-toolkit/jobs/ExtensionJob.py", line 22, in run process.run() File "/root/MJ/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 1701, in run loss_dict = self.hook_train_loop(batch) File "/root/MJ/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1482, in hook_train_loop noise_pred = self.predict_noise( File "/root/MJ/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 890, in predict_noise return self.sd.predict_noise( File "/root/MJ/ai-toolkit/toolkit/stable_diffusion_model.py", line 1639, in predict_noise noise_pred = self.unet( File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 406, in forward encoder_hidden_states, hidden_states = block( File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 200, in forward attn_output, context_attn_output = self.attn( File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 490, in forward return self.processor( File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1802, in call value = attn.to_v(hidden_states) File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/root/MJ/ai-toolkit/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "/root/MJ/ai-toolkit/toolkit/network_mixins.py", line 267, in forward lora_output = self._call_forward(lora_input) File "/root/MJ/ai-toolkit/toolkit/network_mixins.py", line 206, in _call_forward return lx * scale torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 48.00 MiB. GPU 0 has a total capacity of 47.53 GiB of which 43.94 MiB is free. Process 654925 has 5.68 GiB memory in use. Process 706098 has 7.62 GiB memory in use. Including non-PyTorch memory, this process has 34.17 GiB memory in use. Of the allocated memory 33.51 GiB is allocated by PyTorch, and 351.95 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)