sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

When I try to train Lora on my MacBook Pro, keep getting error

Open poodlebb opened this issue 1 year ago • 4 comments

My MacBook Pro has 16GB VRAM but I keep getting error when I tried to try Lora on it, anyone can help?

running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 2000 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 2000 num epochs / epoch数: 1 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 2000 steps: 0%| | 0/2000 [00:00<?, ?it/s]epoch 1/1 Traceback (most recent call last): File "/Users/parkson/kohya_ss/train_network.py", line 793, in train(args) File "/Users/parkson/kohya_ss/train_network.py", line 647, in train accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm) File "/Users/parkson/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1373, in clip_grad_norm_ self.unscale_gradients() File "/Users/parkson/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 1336, in unscale_gradients self.scaler.unscale_(opt) AttributeError: 'NoneType' object has no attribute 'unscale_' steps: 0%| | 0/2000 [14:54<?, ?it/s] Traceback (most recent call last): File "/Users/parkson/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/Users/parkson/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/Users/parkson/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command simple_launcher(args) File "/Users/parkson/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/Users/parkson/kohya_ss/venv/bin/python', 'train_network.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=/Users/parkson/Desktop/TEMP/JulianaChoi_Lora/Image', '--resolution=512,512', '--output_dir=/Users/parkson/Desktop/TEMP/JulianaChoi_Lora/Model', '--logging_dir=/Users/parkson/Desktop/TEMP/JulianaChoi_Lora/Log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=JulianaChoi', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=2000', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--mem_eff_attn', '--bucket_no_upscale']' returned non-zero exit status 1.

poodlebb avatar May 06 '23 06:05 poodlebb

I'm not tested in Mac, but AdamW8bit will not support Mac. Please try another optimizer like AdamW.

kohya-ss avatar May 07 '23 01:05 kohya-ss

I'm not tested in Mac, but AdamW8bit will not support Mac. Please try another optimizer like AdamW.

Hi, I was just attempting to run SDXL training and was running into issues due to autocast. I'm unsure if there's a workaround. I attempted to add torch.autocast(device_type="cpu", enabled=False) directly to sdxl_train_util.py but that didn't work. I just figured to add it where the error originated but I don't know what I'm doing!

Disable Diffusers' xformers enable text encoder training train unet: True, text_encoder1: True, text_encoder2: True number of models: 3 number of trainable parameters: 3385184004 prepare optimizer, data loader etc. INFO use AdamW optimizer | {} train_util.py:3819 override steps. steps for 100 epochs is / 指定エポックまでのステップ数: 600 Traceback (most recent call last): File "/Users/zack/.home/local/gitrepos/ComfyUI/input/sd-scripts-main/sdxl_train.py", line 792, in <module> train(args) File "/Users/zack/.home/local/gitrepos/ComfyUI/input/sd-scripts-main/sdxl_train.py", line 403, in train unet = accelerator.prepare(unet) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1263, in prepare result = tuple( ^^^^^^ File "/Users/zack/.home/local/gitrepos/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1264, in <genexpr> self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1140, in _prepare_one return self.prepare_model(obj, device_placement=device_placement) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1330, in prepare_model autocast_context = get_mixed_precision_context_manager(self.native_amp, self.autocast_handler) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 1745, in get_mixed_precision_context_manager return torch.autocast(device_type=device_type, dtype=torch.float16, **autocast_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/ComfyUI/.venv/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 241, in __init__ raise RuntimeError( RuntimeError: User specified an unsupported autocast device_type 'mps'

Didn't matter if I set Full_Path, Full_FP16, Full_BF16 to true, though I still had "mixed_precision = "fp16"" Also saw it was triggered after the Optimizer, Prodigy, was called, so I turned it off (but it was replaced by AdamW, had intended to disable an optimizer entirely). I also set HighVRAM because from what I was reading, it sounded like the accelerator would only trigger if it saw a necessity to. I do have the M1 Max (64GB VRAM)

I'd love if I could use this on my Mac! But I have a rig, covered in cobwebs, that'll work just fine if not =)

BuildBackBuehler avatar Mar 22 '24 03:03 BuildBackBuehler

The error seems to be raised from accelerate. Please run accelerate config and select MPS for the type of machine.

kohya-ss avatar Mar 25 '24 23:03 kohya-ss

how do i run "accelerate config" ? in terminal or inside your app?

Akossimon avatar Jun 04 '24 13:06 Akossimon