kohya_ss icon indicating copy to clipboard operation
kohya_ss copied to clipboard

returned non-zero exit status 1.

Open Hung0523 opened this issue 2 years ago • 8 comments

raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['F:\Engineering\Python\Python310\python.exe', 'train_network.py', '--pretrained_model_name_or_path=F:/Engineering/AI_Painting/stable-diffusion-webui/models/Stable-diffusion/pastelmix-better-vae-fp16.ckpt', '--train_data_dir=F:/Engineering/AI_Painting/LORA_training/train_data', '--resolution=512,512', '--output_dir=F:/Engineering/AI_Painting/LORA_training/output_models', '--logging_dir=', '--network_alpha=128', '--save_model_as=ckpt', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=Addams', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=750', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

Hung0523 avatar Mar 24 '23 17:03 Hung0523

Need the traceback section. This section of the error does not provide enough information to troubleshoot.

bmaltais avatar Mar 24 '23 20:03 bmaltais

Traceback (most recent call last): File "F:\Engineering\AI_Painting\kohya_ss\train_network.py", line 699, in train(args) File "F:\Engineering\AI_Painting\kohya_ss\train_network.py", line 119, in train accelerator, unwrap_model = train_util.prepare_accelerator(args) File "F:\Engineering\AI_Painting\kohya_ss\library\train_util.py", line 2498, in prepare_accelerator accelerator = Accelerator( File "F:\Engineering\Python\Python310\lib\site-packages\accelerate\accelerator.py", line 355, in init raise ValueError(err.format(mode="fp16", requirement="a GPU")) ValueError: fp16 mixed precision requires a GPU Traceback (most recent call last): File "F:\Engineering\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "F:\Engineering\Python\Python310\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "F:\Engineering\Python\Python310\Scripts\accelerate.exe_main.py", line 7, in File "F:\Engineering\Python\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "F:\Engineering\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "F:\Engineering\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['F:\Engineering\Python\Python310\python.exe', 'train_network.py', '--pretrained_model_name_or_path=F:/Engineering/AI_Painting/stable-diffusion-webui/models/Stable-diffusion/AOM3A1B_orangemixs.safetensors', '--train_data_dir=F:/Engineering/AI_Painting/LORA_training/train_data', '--resolution=512,512', '--output_dir=F:/Engineering/AI_Painting/LORA_training/output_models', '--logging_dir=', '--network_alpha=128', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=flowers', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1125', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.

Hung0523 avatar Mar 25 '23 02:03 Hung0523

The error message indicates that the ValueError occurred because the fp16 mixed precision requires a GPU. What type of NVDia card do you have in your system?

bmaltais avatar Mar 25 '23 03:03 bmaltais

My GPU is 3080Ti. Could it be that my CUDA or cudnn is not installed correctly?

Hung0523 avatar Mar 25 '23 04:03 Hung0523

2023-03-25 13:45:22.622970: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2023-03-25 13:45:22.623110: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2023-03-25 13:45:25.906128: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2023-03-25 13:45:25.906238: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 1.13.1+cu117 with CUDA 1107 (you have 1.13.1+cpu) Python 3.10.9 (you have 3.10.9) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details prepare tokenizer Use DreamBooth method. prepare images. found directory F:\Engineering\AI_Painting\LORA_training\train_data\150_flowers contains 30 image files 4500 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: False

[Subset 0 of Dataset 0] image_dir: "F:\Engineering\AI_Painting\LORA_training\train_data\150_flowers" image_count: 30 num_repeats: 150 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False is_reg: False class_tokens: flowers caption_extension: .txt

Do these messages help to solve the problem?

Hung0523 avatar Mar 25 '23 05:03 Hung0523

same problem,need help

powerdoom avatar Mar 25 '23 14:03 powerdoom

same problem,need help

Junking1992 avatar Mar 25 '23 14:03 Junking1992

The first two messages indicate that the system is trying to load the CUDA library (cudart64_110.dll) but cannot find it. This means that your TensorFlow installation is looking for a GPU but cannot access it since the required library is missing. If you don't have a GPU and are using a CPU for your computation, you can safely ignore these warnings.

The third message suggests that xFormers is not able to load C++/CUDA extensions because the library was built for a different PyTorch and CUDA version. This may affect some functionalities of xFormers, such as memory-efficient attention, SwiGLU, and sparse operations.

Based on this I think you should delete the who kohya_ss and redo the installation fron scratch.

bmaltais avatar Mar 25 '23 16:03 bmaltais