kohya_ss
kohya_ss copied to clipboard
"FileNotFoundError: [Errno 2] No such file or directory:" After Pressing Train Model
Using Ubuntu on RunPod
Folder 125_lilly42: 1500 steps
max_train_steps = 1500
stop_text_encoder_training = 0
lr_warmup_steps = 150
accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="lilly42 LORA/image" --resolution=512,512 --output_dir="lilly42 LORA/model" --logging_dir="lilly42 LORA/log" --save_model_as=safetensors --output_name="last" --learning_rate="1e-5" --lr_scheduler="cosine" --lr_warmup_steps="150" --train_batch_size="1" --max_train_steps="1500" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --cache_latents --bucket_reso_steps=64 --xformers --use_8bit_adam --bucket_no_upscale
Traceback (most recent call last):
File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/gradio/routes.py", line 337, in run_predict
output = await app.get_blocks().process_api(
File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/gradio/blocks.py", line 1015, in process_api
result = await self.call_function(
File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/gradio/blocks.py", line 833, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/kasm-user/Desktop/LORA/kohya_ss/venv/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/kasm-user/Desktop/LORA/kohya_ss/dreambooth_gui.py", line 428, in train_model
subprocess.run(run_cmd)
File "/usr/lib/python3.8/subprocess.py", line 493, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.8/subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'accelerate launch --num_cpu_threads_per_process=2 "train_db.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="lilly42 LORA/image" --resolution=512,512 --output_dir="lilly42 LORA/model" --logging_dir="lilly42 LORA/log" --save_model_as=safetensors --output_name="last" --learning_rate="1e-5" --lr_scheduler="cosine" --lr_warmup_steps="150" --train_batch_size="1" --max_train_steps="1500" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --cache_latents --bucket_reso_steps=64 --xformers --use_8bit_adam --bucket_no_upscale'
I do have accelerate installed, running on python 3.8
the only steps that I had to skip are the "cp" steps as that is primarily meant for windows from my understanding. I ended up just installing bitsandbytes. Does that effectively replace those steps or is there some other way I run those on ubuntu?
To be 100% clear. These are the steps I skipped:
cp .\bitsandbytes_windows*.dll .\venv\Lib\site-packages\bitsandbytes
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
replace '"train_network.py"' as "train_network.py" in lora_gui.py before subprocess.run(run_cmd)
Im not sure what this means, I have tried many different configurations with 1, 2 and 3 quotations, none of them worked
Also I have tried the following:
pip freeze > uninstall.txt pip uninstall -r uninstall.txt
Sorry, I was so busy last week that I forgot to reply. I just remembered when someone mentioned this question again.
First, I change the under directory file "lora_gui.py" 485-487 lines:
run_cmd = run_cmd.replace('"train_network.py"', "train_network.py") print(run_cmd) subprocess.run(run_cmd)
Then, I'm directly at the terminal run this cmd: accelerate launch --num_cpu_threads_per_process=8 train_network.py --enable_bucket --pretrained_model_name_or_path="/home/root/cyh/kohya_ss-master/Basil_mix_fixed.safetensors" --train_data_dir="/home/root/cyh/kohya_ss-master/saitou_asuka/img" --resolution=512,640 --output_dir="/home/root/cyh/kohya_ss-master/saitou_asuka/output" --logging_dir="/home/root/cyh/kohya_ss-master/saitou_asuka/log" --network_alpha="128" --network_module=networks.lora --text_encoder_lr=1e-4 --unet_lr=3e-4 --network_dim=128 --output_name="saitou_asuka_v1.0" --lr_scheduler_num_cycles="1" --learning_rate="1e-4" --lr_scheduler="constant" --train_batch_size="2" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="1234" --caption_extension=".txt" --cache_latents --clip_skip=2 --bucket_reso_steps=64 --xformers --use_8bit_adam --bucket_no_upscale
Maybe you need to change some of the parameters.
On my server, it works well.
The problem is with the format of the popenargs
argument in the subprocess.run
function. popenargs
should be a list where the elements should be commands and arguments separated by spaces. (The current popenargs
parameter input after debug is a tuple)
Try changing the popenargs
parameter to the following format, i.e. separate commands and arguments and put them in a list.
I change the under directory file "lora_gui.py" 504-506 lines:
run_cmd = run_cmd.replace('"train_network.py"', "train_network.py")
print(run_cmd)
run_cmd_popenargs=run_cmd.split(" ")
subprocess.run(run_cmd_popenargs)
or ,Use the parameter shell=True to parse command line strings
run_cmd = run_cmd.replace('"train_network.py"', "train_network.py")
print(run_cmd)
subprocess.run(run_cmd,shell=True)
It works for me.