kohya_ss
kohya_ss copied to clipboard
Getting error when training Lora (torch._dynamo?)
Have been spending hours on this problem and could not figure out how to fix this. I tried all the steps listed on this issue: https://github.com/bmaltais/kohya_ss/issues/192 But it still does not work. I suspect it is related to the PyTorch. I did install PyTorch but one of the sentence in the code still says : ModuleNotFoundError: No module named 'torch._dynamo'
Can any body please help? Thanks in advance.
max_train_steps = 6000
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith/image" --resolution=512,512 --output_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith" --logging_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith" --save_model_as=safetensors --output_name="Wraith" --max_data_loader_n_workers="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="6000" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale
prepare tokenizer
prepare train images.
found directory 100_Wraith contains 120 image files
12000 train images with repeating.
loading image sizes.
100%|██████████████████████████████████████████████████████████████████████████████| 120/120 [00:00<00:00, 1512.90it/s]
prepare dataset
prepare accelerator
Using accelerator 0.15.0 or above.
load Diffusers pretrained models
text_encoder\model.safetensors not found
Fetching 19 files: 100%|███████████████████████████████████████████████████████████████████████| 19/19 [00:00<?, ?it/s]
C:\Users\Jim\kohya_ss\venv\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None
. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Replace CrossAttention.forward to use xformers
caching latents.
100%|████████████████████████████████████████████████████████████████████████████████| 120/120 [00:08<00:00, 13.51it/s]
prepare optimizer, data loader etc.
use AdamW optimizer | {}
Traceback (most recent call last):
File "C:\Users\Jim\kohya_ss\train_db.py", line 346, in
Not sure if it will make a difference but it looks like your folder structures aren't setup the suggested way. -train_data_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith/image" output_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith" this should be C:/Users/Jim/Downloads/Wraith/512/Wraith/model --logging_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith" this should be C:/Users/Jim/Downloads/Wraith/512/Wraith/log
Also try checking/unchecking the 8bitadam box. GL
Be sure that "Do you wish to optimize your script with torch dynamo?" is set as 'no' when configuring accelerate. Unless you need it enabled for some reason.
Not sure if it will make a difference but it looks like your folder structures aren't setup the suggested way. -train_data_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith/image" output_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith" this should be C:/Users/Jim/Downloads/Wraith/512/Wraith/model --logging_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith" this should be C:/Users/Jim/Downloads/Wraith/512/Wraith/log
Also try checking/unchecking the 8bitadam box. GL
Thank you for the reply! I did the right directory and tried using or uncheck ing 8bit adam. Sadly the same error comes up.
Be sure that "Do you wish to optimize your script with torch dynamo?" is set as 'no' when configuring accelerate. Unless you need it enabled for some reason.
Thank you for the reply! Where do i have that option? I dont get what is configuring accelerate.
Do you wish to optimize your script with torch dynamo
Upon searching on the topic I realize you are talking about configuring accelerate package. I did that and chose No for optimize script with torch dynamo. That did give me some progress but another problem arises. I think Torch is reserving all of my VRAM( I have 12 gb) and left none for the training. How can I avoid this?
Folder 100_Wraith: 12000 steps
max_train_steps = 6000
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith/image" --resolution=512,512 --output_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith/model" --logging_dir="C:/Users/Jim/Downloads/Wraith/512/Wraith/log" --save_model_as=safetensors --output_name="Wraith" --max_data_loader_n_workers="1" --learning_rate="0.0001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="6000" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale
prepare tokenizer
prepare train images.
found directory 100_Wraith contains 120 image files
12000 train images with repeating.
loading image sizes.
100%|██████████████████████████████████████████████████████████████████████████████| 120/120 [00:00<00:00, 9999.93it/s]
prepare dataset
prepare accelerator
Using accelerator 0.15.0 or above.
load Diffusers pretrained models
text_encoder\model.safetensors not found
Fetching 19 files: 100%|████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 19001.38it/s]
C:\Users\Jim\kohya_ss\venv\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
warnings.warn(
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None
. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Replace CrossAttention.forward to use xformers
caching latents.
100%|████████████████████████████████████████████████████████████████████████████████| 120/120 [00:06<00:00, 18.68it/s]
prepare optimizer, data loader etc.
use AdamW optimizer | {}
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 12000
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 6000
num epochs / epoch数: 1
batch size per device / バッチサイズ: 2
total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 2
gradient ccumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 6000
steps: 0%| | 0/6000 [00:00<?, ?it/s]epoch 1/1
Traceback (most recent call last):
File "C:\Users\Jim\kohya_ss\train_db.py", line 346, in
I'm having the same problem as yours, did you figure out how to fix it?