kohya_ss icon indicating copy to clipboard operation
kohya_ss copied to clipboard

"FileNotFoundError: [Errno 2] No such file or directory:" After Pressing Train Model

Open derto42 opened this issue 2 years ago • 8 comments

Using Ubuntu on RunPod

Screenshot_59 Folder 125_lilly42: 1500 steps max_train_steps = 1500 stop_text_encoder_training = 0 lr_warmup_steps = 150 accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="lilly42 LORA/image" --resolution=512,512 --output_dir="lilly42 LORA/model" --logging_dir="lilly42 LORA/log" --save_model_as=safetensors --output_name="last" --learning_rate="1e-5" --lr_scheduler="cosine" --lr_warmup_steps="150" --train_batch_size="1" --max_train_steps="1500" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --cache_latents --bucket_reso_steps=64 --xformers --use_8bit_adam --bucket_no_upscale Traceback (most recent call last): File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/gradio/routes.py", line 337, in run_predict output = await app.get_blocks().process_api( File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/gradio/blocks.py", line 1015, in process_api result = await self.call_function( File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/gradio/blocks.py", line 833, in call_function prediction = await anyio.to_thread.run_sync( File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/kasm-user/Desktop/LORA/kohya_ss/dreambooth_gui.py", line 428, in train_model subprocess.run(run_cmd) File "/usr/lib/python3.8/subprocess.py", line 493, in run with Popen(*popenargs, **kwargs) as process: File "/usr/lib/python3.8/subprocess.py", line 858, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="lilly42 LORA/image" --resolution=512,512 --output_dir="lilly42 LORA/model" --logging_dir="lilly42 LORA/log" --save_model_as=safetensors --output_name="last" --learning_rate="1e-5" --lr_scheduler="cosine" --lr_warmup_steps="150" --train_batch_size="1" --max_train_steps="1500" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --cache_latents --bucket_reso_steps=64 --xformers --use_8bit_adam --bucket_no_upscale'

derto42 avatar Feb 14 '23 08:02 derto42

I do have accelerate installed, running on python 3.8

derto42 avatar Feb 14 '23 19:02 derto42

the only steps that I had to skip are the "cp" steps as that is primarily meant for windows from my understanding. I ended up just installing bitsandbytes. Does that effectively replace those steps or is there some other way I run those on ubuntu?

derto42 avatar Feb 14 '23 19:02 derto42

To be 100% clear. These are the steps I skipped:

cp .\bitsandbytes_windows*.dll .\venv\Lib\site-packages\bitsandbytes
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py

derto42 avatar Feb 14 '23 19:02 derto42

replace '"train_network.py"' as "train_network.py" in lora_gui.py before subprocess.run(run_cmd)

Cococyh avatar Feb 16 '23 02:02 Cococyh

Im not sure what this means, I have tried many different configurations with 1, 2 and 3 quotations, none of them worked

derto42 avatar Feb 17 '23 03:02 derto42

Also I have tried the following:

pip freeze > uninstall.txt pip uninstall -r uninstall.txt

derto42 avatar Feb 17 '23 05:02 derto42

Sorry, I was so busy last week that I forgot to reply. I just remembered when someone mentioned this question again.

First, I change the under directory file "lora_gui.py" 485-487 lines:

run_cmd = run_cmd.replace('"train_network.py"', "train_network.py") print(run_cmd) subprocess.run(run_cmd)

Then, I'm directly at the terminal run this cmd: accelerate launch --num_cpu_threads_per_process=8 train_network.py --enable_bucket --pretrained_model_name_or_path="/home/root/cyh/kohya_ss-master/Basil_mix_fixed.safetensors" --train_data_dir="/home/root/cyh/kohya_ss-master/saitou_asuka/img" --resolution=512,640 --output_dir="/home/root/cyh/kohya_ss-master/saitou_asuka/output" --logging_dir="/home/root/cyh/kohya_ss-master/saitou_asuka/log" --network_alpha="128" --network_module=networks.lora --text_encoder_lr=1e-4 --unet_lr=3e-4 --network_dim=128 --output_name="saitou_asuka_v1.0" --lr_scheduler_num_cycles="1" --learning_rate="1e-4" --lr_scheduler="constant" --train_batch_size="2" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --clip_skip=2 --bucket_reso_steps=64 --xformers --use_8bit_adam --bucket_no_upscale

Maybe you need to change some of the parameters.

On my server, it works well.

Cococyh avatar Feb 24 '23 09:02 Cococyh

The problem is with the format of the popenargs argument in the subprocess.run function. popenargs should be a list where the elements should be commands and arguments separated by spaces. (The current popenargs parameter input after debug is a tuple)

Try changing the popenargs parameter to the following format, i.e. separate commands and arguments and put them in a list.

I change the under directory file "lora_gui.py" 504-506 lines:

run_cmd = run_cmd.replace('"train_network.py"', "train_network.py")
print(run_cmd)
run_cmd_popenargs=run_cmd.split(" ")
subprocess.run(run_cmd_popenargs)

or ,Use the parameter shell=True to parse command line strings

run_cmd = run_cmd.replace('"train_network.py"', "train_network.py")
print(run_cmd)
subprocess.run(run_cmd,shell=True)

It works for me.

luochenxi avatar Feb 27 '23 04:02 luochenxi