lora-scripts icon indicating copy to clipboard operation
lora-scripts copied to clipboard

能支持设置连续训练队列吗

Open Reekin opened this issue 2 years ago • 23 comments

每个lora训练几个小时,睡前开始跑,早上急着出门来不及重开一轮,一天里大部分时间都浪费了有点可惜

Reekin avatar Mar 22 '23 17:03 Reekin

挂个远程桌面吧

petercham avatar Mar 28 '23 08:03 petercham

把配置写好,比如在batchTrain文件夹里写好a.ps1,b.ps1 然后再写个ps1脚本依次运行文件夹里的脚本就行了

DoubleCake avatar May 08 '23 03:05 DoubleCake

简单改了下train_by_toml.ps1,可以按配置批量炼丹,把下面代码复制到记事本保存,后缀改成ps1,放到根目录 2024.03.17更新适配v1.8.0.fix1版本,再试试看 @cindylaii @dahuzi773

# LoRA train script by @Akegarasu

# 2024.03.17
# 批量配置文件,每行以逗号结尾,最后一行不要逗号
# 若报错先分别执行:A强制更新-国内加速.bat install-cn.ps1
# 若想看图表可执行:tensorboard.ps1
[string[]]$my_batch_tomls=(
"./toml/批次1.toml",
"./toml/批次2.toml",
"./toml/批次3.toml"
)
# 若你执行此脚本还剩N小时才能跑完想继续加任务,
# 下方代码可提供等待功能,等待时间到了才会执行,
# 页面上导出几个新的训练参数toml,复制一个此脚本,修改上方toml名字,
# 删除下一行的#井号,修改等待秒数7200为N乘以3600秒后的值
#Get-Date; Start-Sleep -Seconds 7200; Get-Date


#$config_file = "./config/default.toml"		 # config_file | 使用toml文件指定训练参数
#$sample_prompts = "./config/sample_prompts.txt"		 # sample_prompts | 采样prompts文件,留空则不启用采样功能

$sdxl = 0        # for sdxl model | SDXL 训练
$multi_gpu = 0		 # multi gpu | 多显卡训练 该参数仅限在显卡数 >= 2 使用

# ============= DO NOT MODIFY CONTENTS BELOW | 请勿修改下方内容 =====================

# Activate python venv
.\venv\Scripts\activate

$Env:HF_HOME = "huggingface"
$Env:PYTHONUTF8 = 1

$ext_args = [System.Collections.ArrayList]::new()
$launch_args = [System.Collections.ArrayList]::new()

if ($multi_gpu) {
  [void]$launch_args.Add("--multi_gpu")
}
if ($sdxl) {
  [void]$launch_args.Add("--sdxl")
}

for ($i = 0; $i -lt $my_batch_tomls.Length; ++$i) {
	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train start$($i+1): $($my_batch_tomls[$i])"
	# run train
    $script_name = if ($sdxl) { "sdxl_train_network.py" } else { "train_network.py" }
    python -m accelerate.commands.launch $launch_args --num_cpu_threads_per_process=8 "./sd-scripts/$script_name" `
      --config_file=$($my_batch_tomls[$i]) `
      #--sample_prompts=$sample_prompts `
      $ext_args

	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train finished$($i+1): $($my_batch_tomls[$i])"
}

Write-Output "Train finished"
Read-Host | Out-Null



wakening avatar Aug 18 '23 14:08 wakening

简单改了下train_by_toml.ps1,可以按配置批量炼丹,把下面代码复制到记事本保存,后缀改成ps1,放到根目录

# LoRA train script by @Akegarasu

# 批量配置文件,每行以逗号结尾,最后一行不要逗号
# 若报错先分别执行:A强制更新-国内加速.bat install-cn.ps1
# 若想看图表可执行:tensorboard.ps1
[string[]]$my_batch_tomls=(
"./toml/批次1.toml",
"./toml/批次2.toml",
"./toml/批次3.toml"
)

$multi_gpu = 0		 # multi gpu | 多显卡训练 该参数仅限在显卡数 >= 2 使用
#$config_file = "./toml/default.toml"		 # config_file | 使用toml文件指定训练参数
#$sample_prompts = "./toml/sample_prompts.txt"		 # sample_prompts | 采样prompts文件,留空则不启用采样功能
$utf8 = 1		 # utf8 | 使用utf-8编码读取toml;以utf-8编码编写的、含中文的toml必须开启


# ============= DO NOT MODIFY CONTENTS BELOW | 请勿修改下方内容 =====================

# Activate python venv
.\venv\Scripts\activate

$Env:HF_HOME = "huggingface"

$ext_args = [System.Collections.ArrayList]::new()
$launch_args = [System.Collections.ArrayList]::new()

if ($multi_gpu) {
  [void]$launch_args.Add("--multi_gpu")
}
if ($utf8 -eq 1) {
  $Env:PYTHONUTF8 = 1
}

for ($i = 0; $i -lt $my_batch_tomls.Length; ++$i) {
	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train start$($i+1): $($my_batch_tomls[$i])"
	# run train
	python -m accelerate.commands.launch $launch_args --num_cpu_threads_per_process=8 "./sd-scripts/train_network.py" `
	--config_file=$($my_batch_tomls[$i]) `
	#--sample_prompts=$sample_prompts `
	$ext_args

	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train finished$($i+1): $($my_batch_tomls[$i])"
}


Read-Host | Out-Null ;

请问下这个ps1是直接用powershell运行吗?我运行总是提示.\venv\Scripts\activate不存在,确实根目录里也没这个venv虚拟环境

dahuzi773 avatar Nov 24 '23 08:11 dahuzi773

简单改了下train_by_toml.ps1,可以按配置批量炼丹,把下面代码复制到记事本保存,后缀改成ps1,放到根目录

# LoRA train script by @Akegarasu

# 批量配置文件,每行以逗号结尾,最后一行不要逗号
# 若报错先分别执行:A强制更新-国内加速.bat install-cn.ps1
# 若想看图表可执行:tensorboard.ps1
[string[]]$my_batch_tomls=(
"./toml/批次1.toml",
"./toml/批次2.toml",
"./toml/批次3.toml"
)

$multi_gpu = 0		 # multi gpu | 多显卡训练 该参数仅限在显卡数 >= 2 使用
#$config_file = "./toml/default.toml"		 # config_file | 使用toml文件指定训练参数
#$sample_prompts = "./toml/sample_prompts.txt"		 # sample_prompts | 采样prompts文件,留空则不启用采样功能
$utf8 = 1		 # utf8 | 使用utf-8编码读取toml;以utf-8编码编写的、含中文的toml必须开启


# ============= DO NOT MODIFY CONTENTS BELOW | 请勿修改下方内容 =====================

# Activate python venv
.\venv\Scripts\activate

$Env:HF_HOME = "huggingface"

$ext_args = [System.Collections.ArrayList]::new()
$launch_args = [System.Collections.ArrayList]::new()

if ($multi_gpu) {
  [void]$launch_args.Add("--multi_gpu")
}
if ($utf8 -eq 1) {
  $Env:PYTHONUTF8 = 1
}

for ($i = 0; $i -lt $my_batch_tomls.Length; ++$i) {
	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train start$($i+1): $($my_batch_tomls[$i])"
	# run train
	python -m accelerate.commands.launch $launch_args --num_cpu_threads_per_process=8 "./sd-scripts/train_network.py" `
	--config_file=$($my_batch_tomls[$i]) `
	#--sample_prompts=$sample_prompts `
	$ext_args

	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train finished$($i+1): $($my_batch_tomls[$i])"
}


Read-Host | Out-Null ;

请问下这个ps1是直接用powershell运行吗?我运行总是提示.\venv\Scripts\activate不存在,确实根目录里也没这个venv虚拟环境

請參考原文:若报错先分别执行:A强制更新-国内加速.bat install-cn.ps1 我不在中國,我使用 install.ps1裝完環境,然後再執行上面大大寫的xxx.ps1,可執行但是後面還是有問題

cindylaii avatar Mar 16 '24 15:03 cindylaii

简单改了下train_by_toml.ps1,可以按配置批量炼丹,把下面代码复制到记事本保存,后缀改成ps1,放到根目录

# LoRA train script by @Akegarasu

# 批量配置文件,每行以逗号结尾,最后一行不要逗号
# 若报错先分别执行:A强制更新-国内加速.bat install-cn.ps1
# 若想看图表可执行:tensorboard.ps1
[string[]]$my_batch_tomls=(
"./toml/批次1.toml",
"./toml/批次2.toml",
"./toml/批次3.toml"
)

$multi_gpu = 0		 # multi gpu | 多显卡训练 该参数仅限在显卡数 >= 2 使用
#$config_file = "./toml/default.toml"		 # config_file | 使用toml文件指定训练参数
#$sample_prompts = "./toml/sample_prompts.txt"		 # sample_prompts | 采样prompts文件,留空则不启用采样功能
$utf8 = 1		 # utf8 | 使用utf-8编码读取toml;以utf-8编码编写的、含中文的toml必须开启


# ============= DO NOT MODIFY CONTENTS BELOW | 请勿修改下方内容 =====================

# Activate python venv
.\venv\Scripts\activate

$Env:HF_HOME = "huggingface"

$ext_args = [System.Collections.ArrayList]::new()
$launch_args = [System.Collections.ArrayList]::new()

if ($multi_gpu) {
  [void]$launch_args.Add("--multi_gpu")
}
if ($utf8 -eq 1) {
  $Env:PYTHONUTF8 = 1
}

for ($i = 0; $i -lt $my_batch_tomls.Length; ++$i) {
	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train start$($i+1): $($my_batch_tomls[$i])"
	# run train
	python -m accelerate.commands.launch $launch_args --num_cpu_threads_per_process=8 "./sd-scripts/train_network.py" `
	--config_file=$($my_batch_tomls[$i]) `
	#--sample_prompts=$sample_prompts `
	$ext_args

	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train finished$($i+1): $($my_batch_tomls[$i])"
}


Read-Host | Out-Null ;

请问下这个ps1是直接用powershell运行吗?我运行总是提示.\venv\Scripts\activate不存在,确实根目录里也没这个venv虚拟环境

請參考原文:若报错先分别执行:A强制更新-国内加速.bat install-cn.ps1 我不在中國,我使用 install.ps1裝完環境,然後再執行上面大大寫的xxx.ps1,可執行但是後面還是有問題

已更新适配新版

wakening avatar Mar 16 '24 17:03 wakening

简单改了下train_by_toml.ps1,可以按配置批量炼丹,把下面代码复制到记事本保存,后缀改成ps1,放到根目录

# LoRA train script by @Akegarasu

# 批量配置文件,每行以逗号结尾,最后一行不要逗号
# 若报错先分别执行:A强制更新-国内加速.bat install-cn.ps1
# 若想看图表可执行:tensorboard.ps1
[string[]]$my_batch_tomls=(
"./toml/批次1.toml",
"./toml/批次2.toml",
"./toml/批次3.toml"
)

$multi_gpu = 0		 # multi gpu | 多显卡训练 该参数仅限在显卡数 >= 2 使用
#$config_file = "./toml/default.toml"		 # config_file | 使用toml文件指定训练参数
#$sample_prompts = "./toml/sample_prompts.txt"		 # sample_prompts | 采样prompts文件,留空则不启用采样功能
$utf8 = 1		 # utf8 | 使用utf-8编码读取toml;以utf-8编码编写的、含中文的toml必须开启


# ============= DO NOT MODIFY CONTENTS BELOW | 请勿修改下方内容 =====================

# Activate python venv
.\venv\Scripts\activate

$Env:HF_HOME = "huggingface"

$ext_args = [System.Collections.ArrayList]::new()
$launch_args = [System.Collections.ArrayList]::new()

if ($multi_gpu) {
  [void]$launch_args.Add("--multi_gpu")
}
if ($utf8 -eq 1) {
  $Env:PYTHONUTF8 = 1
}

for ($i = 0; $i -lt $my_batch_tomls.Length; ++$i) {
	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train start$($i+1): $($my_batch_tomls[$i])"
	# run train
	python -m accelerate.commands.launch $launch_args --num_cpu_threads_per_process=8 "./sd-scripts/train_network.py" `
	--config_file=$($my_batch_tomls[$i]) `
	#--sample_prompts=$sample_prompts `
	$ext_args

	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train finished$($i+1): $($my_batch_tomls[$i])"
}


Read-Host | Out-Null ;

请问下这个ps1是直接用powershell运行吗?我运行总是提示.\venv\Scripts\activate不存在,确实根目录里也没这个venv虚拟环境

請參考原文:若报错先分别执行:A强制更新-国内加速.bat install-cn.ps1 我不在中國,我使用 install.ps1裝完環境,然後再執行上面大大寫的xxx.ps1,可執行但是後面還是有問題

已更新适配新版

非常感謝你快速的回覆,我剛才試仍有錯誤。 我仔細想了一下先試一下內建的程式, 發現"train_by_toml.ps1"都不能用,錯誤如下。所以是我這裡環境的問題。 但是我用GUI一切正常,我這兩週練了5個LORA,今天凌晨也剛用GUI練完一個。 目前打算再開一個資料夾全部重裝試試。

2024-03-17 08:53:02 INFO loading u-net: <All keys matched successfully> model_util.py:1009 2024-03-17 08:53:03 INFO loading vae: <All keys matched successfully> model_util.py:1017 2024-03-17 08:53:04 INFO loading text encoder: <All keys matched successfully> model_util.py:1074 2024-03-17 08:53:05 INFO Enable xformers for U-Net train_util.py:2577 Traceback (most recent call last): File "D:\StableDiffusion\lora-scripts-gui\sd-scripts\train_network.py", line 1063, in trainer.train(args) File "D:\StableDiffusion\lora-scripts-gui\sd-scripts\train_network.py", line 243, in train vae.set_use_memory_efficient_attention_xformers(args.xformers) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 262, in set_use_memory_efficient_attention_xformers fn_recursive_set_mem_eff(module) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff module.set_use_memory_efficient_attention_xformers(valid, attention_op) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\attention_processor.py", line 273, in set_use_memory_efficient_attention_xformers raise e File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\attention_processor.py", line 267, in set_use_memory_efficient_attention_xformers _ = xformers.ops.memory_efficient_attention( File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\xformers\ops\fmha_init_.py", line 247, in memory_efficient_attention return memory_efficient_attention( File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\xformers\ops\fmha_init.py", line 365, in _memory_efficient_attention return memory_efficient_attention_forward( File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\xformers\ops\fmha_init.py", line 381, in _memory_efficient_attention_forward op = _dispatch_fw(inp, False) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\xformers\ops\fmha\dispatch.py", line 125, in _dispatch_fw return _run_priority_list( File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\xformers\ops\fmha\dispatch.py", line 65, in _run_priority_list raise NotImplementedError(msg) NotImplementedError: No operator found for memory_efficient_attention_forward with inputs: query : shape=(1, 2, 1, 40) (torch.float32) key : shape=(1, 2, 1, 40) (torch.float32) value : shape=(1, 2, 1, 40) (torch.float32) attn_bias : <class 'NoneType'> p : 0.0 decoderF is not supported because: xFormers wasn't build with CUDA support attn_bias type is <class 'NoneType'> operator wasn't built - see python -m xformers.info for more info [email protected] is not supported because: xFormers wasn't build with CUDA support dtype=torch.float32 (supported: {torch.bfloat16, torch.float16}) operator wasn't built - see python -m xformers.info for more info cutlassF is not supported because: xFormers wasn't build with CUDA support operator wasn't built - see python -m xformers.info for more info smallkF is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 xFormers wasn't build with CUDA support operator wasn't built - see python -m xformers.info for more info unsupported embed per head: 40 Traceback (most recent call last): File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\accelerate\commands\launch.py", line 1027, in main() File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\accelerate\commands\launch.py", line 1023, in main launch_command(args) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\StableDiffusion\lora-scripts-gui\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--config_file=./toml/batch_2.toml', '--sample_prompts=./config/sample_prompts.txt']' returned non-zero exit status 1. Train finished

cindylaii avatar Mar 17 '24 00:03 cindylaii

放棄了...一早起來重裝環境就是裝不起來, CUDA也裝了,也照網友說的用release 1.8.3,還是不能訓練..暈

create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. Traceback (most recent call last): File "D:\StableDiffusion\lora-scripts-1.8.3\sd-scripts\train_network.py", line 996, in trainer.train(args) File "D:\StableDiffusion\lora-scripts-1.8.3\sd-scripts\train_network.py", line 348, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "D:\StableDiffusion\lora-scripts-1.8.3\sd-scripts\library\train_util.py", line 3491, in get_optimizer import bitsandbytes as bnb File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\bitsandbytes_init_.py", line 16, in from .nn import modules File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\bitsandbytes\nn_init_.py", line 17, in from .triton_based_modules import ( File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\bitsandbytes\nn\triton_based_modules.py", line 6, in from bitsandbytes.triton.dequantize_rowwise import dequantize_rowwise File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\bitsandbytes\triton\dequantize_rowwise.py", line 11, in import triton File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\triton_init_.py", line 13, in from . import language File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\triton\language_init_.py", line 2, in from . import core, extern, libdevice, random File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\triton\language\core.py", line 1141, in def abs(x): File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\triton\runtime\jit.py", line 386, in jit return JITFunction(args[0], **kwargs) File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\triton\runtime\jit.py", line 315, in init self.run = self._make_launcher() File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\triton\runtime\jit.py", line 282, in _make_launcher scope = {"version_key": version_key(), "get_cuda_stream": get_cuda_stream, File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\triton\runtime\jit.py", line 82, in version_key with open(triton._C.libtriton.file, "rb") as f: AttributeError: partially initialized module 'triton' has no attribute '_C' (most likely due to a circular import) Traceback (most recent call last): File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\accelerate\commands\launch.py", line 1027, in main() File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\accelerate\commands\launch.py", line 1023, in main launch_command(args) File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "D:\StableDiffusion\lora-scripts-1.8.3\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\StableDiffusion\lora-scripts-1.8.3\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--config_file=./toml/batch_2.toml', '--sample_prompts=./config/sample_prompts.txt']' returned non-zero exit status 1. Train finished

cindylaii avatar Mar 17 '24 02:03 cindylaii

简单改了下train_by_toml.ps1,可以按配置批量炼丹,把下面代码复制到记事本保存,后缀改成ps1,放到根目录

# LoRA train script by @Akegarasu

# 批量配置文件,每行以逗号结尾,最后一行不要逗号
# 若报错先分别执行:A强制更新-国内加速.bat install-cn.ps1
# 若想看图表可执行:tensorboard.ps1
[string[]]$my_batch_tomls=(
"./toml/批次1.toml",
"./toml/批次2.toml",
"./toml/批次3.toml"
)

$multi_gpu = 0		 # multi gpu | 多显卡训练 该参数仅限在显卡数 >= 2 使用
#$config_file = "./toml/default.toml"		 # config_file | 使用toml文件指定训练参数
#$sample_prompts = "./toml/sample_prompts.txt"		 # sample_prompts | 采样prompts文件,留空则不启用采样功能
$utf8 = 1		 # utf8 | 使用utf-8编码读取toml;以utf-8编码编写的、含中文的toml必须开启


# ============= DO NOT MODIFY CONTENTS BELOW | 请勿修改下方内容 =====================

# Activate python venv
.\venv\Scripts\activate

$Env:HF_HOME = "huggingface"

$ext_args = [System.Collections.ArrayList]::new()
$launch_args = [System.Collections.ArrayList]::new()

if ($multi_gpu) {
  [void]$launch_args.Add("--multi_gpu")
}
if ($utf8 -eq 1) {
  $Env:PYTHONUTF8 = 1
}

for ($i = 0; $i -lt $my_batch_tomls.Length; ++$i) {
	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train start$($i+1): $($my_batch_tomls[$i])"
	# run train
	python -m accelerate.commands.launch $launch_args --num_cpu_threads_per_process=8 "./sd-scripts/train_network.py" `
	--config_file=$($my_batch_tomls[$i]) `
	#--sample_prompts=$sample_prompts `
	$ext_args

	Write-Output "$((Get-Date).ToString("yyyy:MM:dd HH:mm:ss"))  Train finished$($i+1): $($my_batch_tomls[$i])"
}


Read-Host | Out-Null ;

请问下这个ps1是直接用powershell运行吗?我运行总是提示.\venv\Scripts\activate不存在,确实根目录里也没这个venv虚拟环境

請參考原文:若报错先分别执行:A强制更新-国内加速.bat install-cn.ps1 我不在中國,我使用 install.ps1裝完環境,然後再執行上面大大寫的xxx.ps1,可執行但是後面還是有問題

已更新适配新版

非常感謝你快速的回覆,我剛才試仍有錯誤。 我仔細想了一下先試一下內建的程式, 發現"train_by_toml.ps1"都不能用,錯誤如下。所以是我這裡環境的問題。 但是我用GUI一切正常,我這兩週練了5個LORA,今天凌晨也剛用GUI練完一個。 目前打算再開一個資料夾全部重裝試試。

2024-03-17 08:53:02 INFO loading u-net: model_util.py:1009 2024-03-17 08:53:03 INFO loading vae: model_util.py:1017 2024-03-17 08:53:04 INFO loading text encoder: model_util.py:1074 2024-03-17 08:53:05 INFO Enable xformers for U-Net train_util.py:2577 Traceback (most recent call last): File "D:\StableDiffusion\lora-scripts-gui\sd-scripts\train_network.py", line 1063, in trainer.train(args) File "D:\StableDiffusion\lora-scripts-gui\sd-scripts\train_network.py", line 243, in train vae.set_use_memory_efficient_attention_xformers(args.xformers) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 262, in set_use_memory_efficient_attention_xformers fn_recursive_set_mem_eff(module) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff fn_recursive_set_mem_eff(child) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff module.set_use_memory_efficient_attention_xformers(valid, attention_op) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\attention_processor.py", line 273, in set_use_memory_efficient_attention_xformers raise e File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\diffusers\models\attention_processor.py", line 267, in set_use_memory_efficient_attention_xformers _ = xformers.ops.memory_efficient_attention( File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\xformers\ops\fmha__init__.py", line 247, in memory_efficient_attention return memory_efficient_attention( File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\xformers\ops\fmha__init_.py", line 365, in memory_efficient_attention return memory_efficient_attention_forward( File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\xformers\ops\fmha__init.py", line 381, in _memory_efficient_attention_forward op = _dispatch_fw(inp, False) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\xformers\ops\fmha\dispatch.py", line 125, in _dispatch_fw return _run_priority_list( File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\xformers\ops\fmha\dispatch.py", line 65, in _run_priority_list raise NotImplementedError(msg) NotImplementedError: No operator found for memory_efficient_attention_forward with inputs: query : shape=(1, 2, 1, 40) (torch.float32) key : shape=(1, 2, 1, 40) (torch.float32) value : shape=(1, 2, 1, 40) (torch.float32) attn_bias : <class 'NoneType'> p : 0.0 decoderF is not supported because: xFormers wasn't build with CUDA support attn_bias type is <class 'NoneType'> operator wasn't built - see python -m xformers.info for more info [email protected] is not supported because: xFormers wasn't build with CUDA support dtype=torch.float32 (supported: {torch.bfloat16, torch.float16}) operator wasn't built - see python -m xformers.info for more info cutlassF is not supported because: xFormers wasn't build with CUDA support operator wasn't built - see python -m xformers.info for more info smallkF is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 xFormers wasn't build with CUDA support operator wasn't built - see python -m xformers.info for more info unsupported embed per head: 40 Traceback (most recent call last): File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\accelerate\commands\launch.py", line 1027, in main() File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\accelerate\commands\launch.py", line 1023, in main launch_command(args) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "D:\StableDiffusion\lora-scripts-gui\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\StableDiffusion\lora-scripts-gui\venv\Scripts\python.exe', './sd-scripts/train_network.py', '--config_file=./toml/batch_2.toml', '--sample_prompts=./config/sample_prompts.txt']' returned non-zero exit status 1. Train finished

此版本为基础,先确认已执行 A强制更新.bat 和 install.ps1; 看日志是xFormers有问题,在页面 专家 -速度优化选项下关闭xFormers; 再重新跑一遍,提供启动到报错的完整日志和toml

wakening avatar Mar 17 '24 10:03 wakening

此版本为基础,先确认已执行 A强制更新.bat 和 install.ps1; 看日志是xFormers有问题,在页面 专家 -速度优化选项下关闭xFormers; 再重新跑一遍,提供启动到报错的完整日志和toml

成功了,您的程式很好。很感謝您的幫忙。這樣我以後可以一次排程練3個。放一個晚上就好了。

是我對安裝不太熟。 寫一下我的流程:

我的環境Windows 11, RTX4080 Laptop

  1. 安裝CUDA 11.8
  2. 下載秋葉訓練器(lora-scripts) release版1.8.3
  3. 在Windows裡要用管理員權限打開Powershell 輸入Set-ExecutionPolicy RemoteSigned 執行.\install.ps1

安裝好後,再試script .\train_by_toml_new_cindy_2_lora.ps1 (即您的程式..我放2個照片集訓練)

若不能用,試一下.\run_gui.ps1 訓練,我用GUI訓練是正常的。 再回到Powershell試.\train_by_toml_new_cindy_2_lora.ps1 (即您的程式..我放2個照片集訓練)

就可以用了,可掛上多個toml訓練 看起來是正常的

cindylaii avatar Mar 18 '24 15:03 cindylaii

我今天試還是有問題..一樣是gui能用,script不能用.. 抱歉上面的留言可能沒什麼幫助..

cindylaii avatar Mar 24 '24 15:03 cindylaii

剛才的大發現,train.ps1我能用 但是train_by_toml.ps1不能用。 如果我在train_by_toml.ps1裡加一行 $Env:XFORMERS_FORCE_DISABLE_TRITON = "1" 就可以用了。我不知道這有什麼影響。

cindylaii avatar Mar 24 '24 16:03 cindylaii