ValueError: Default process group has not been initialized, please make sure to call init_process_group.
我是第一次使用这个工具,我在使用新手训练的时候,我选择了图片路径以及基础模型,其他参数都没有改动,直接训练,报错:ValueError: Default process group has not been initialized, please make sure to call init_process_group.
请问这个问题应该怎么解决呢?
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\scripts\stable\train_network.py", line 1115, in <module>
trainer.train(args)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\scripts\stable\train_network.py", line 226, in train
accelerator = train_util.prepare_accelerator(args)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\scripts\stable\library\train_util.py", line 4307, in prepare_accelerator
accelerator = Accelerator(
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\accelerate\accelerator.py", line 383, in __init__
self.state = AcceleratorState(
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\accelerate\state.py", line 846, in __init__
PartialState(cpu, **kwargs)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\accelerate\state.py", line 270, in __init__
self.num_processes = torch.distributed.get_world_size()
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\distributed_c10d.py", line 1832, in get_world_size
return _get_group_size(group)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\distributed_c10d.py", line 864, in _get_group_size
default_pg = _get_default_group()
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\distributed_c10d.py", line 1025, in _get_default_group
raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.```
这里是具体的报错信息
我在添加你所说的代码之后,接着进行训练,还是有报错。以下是报错信息:
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\anyio\streams\memory.py", line 94, in receive
return self.receive_nowait()
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\anyio\streams\memory.py", line 89, in receive_nowait
raise WouldBlock
anyio.WouldBlock
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\middleware\base.py", line 78, in call_next
message = await recv_stream.receive()
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\anyio\streams\memory.py", line 114, in receive
raise EndOfStream
anyio.EndOfStream
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\fastapi\applications.py", line 276, in __call__
await super().__call__(scope, receive, send)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\middleware\errors.py", line 184, in __call__
raise exc
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\middleware\errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\middleware\base.py", line 108, in __call__
response = await self.dispatch_func(request, call_next)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\mikazuki\app\application.py", line 74, in add_cache_control_header
response = await call_next(request)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\middleware\base.py", line 84, in call_next
raise app_exc
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\middleware\base.py", line 70, in coro
await self.app(scope, receive_or_disconnect, send_no_error)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\middleware\exceptions.py", line 79, in __call__
raise exc
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\middleware\exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 21, in __call__
raise e
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\fastapi\middleware\asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\routing.py", line 276, in handle
await self.app(scope, receive, send)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\starlette\routing.py", line 66, in app
response = await func(request)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\fastapi\routing.py", line 237, in app
raw_response = await run_endpoint_function(
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\fastapi\routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\mikazuki\app\api.py", line 122, in create_toml_file
dist.init_process_group(backend='gloo')
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\c10d_logger.py", line 79, in wrapper
return func(*args, **kwargs)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\c10d_logger.py", line 93, in wrapper
func_return = func(*args, **kwargs)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\distributed_c10d.py", line 1361, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\rendezvous.py", line 246, in _env_rendezvous_handler
rank = int(_get_env_or_raise("RANK"))
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\rendezvous.py", line 231, in _get_env_or_raise
raise _env_error(env_var)
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set
2024-12-25 10:19:56 INFO Loading settings from train_util.py:3745
D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\config\autosave\20241
225-101943.toml...
2024-12-25 10:20:14 INFO D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\config\autosave\20241 train_util.py:3764
225-101943
2024-12-25 10:20:14 INFO prepare tokenizer train_util.py:4228
2024-12-25 10:20:15 INFO update token length: 255 train_util.py:4245
INFO Using DreamBooth method. train_network.py:172
INFO prepare images. train_util.py:1573
INFO found directory train_util.py:1520
D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\train\10_people\1_zkz
contains 169 image files
INFO 169 train images with repeating. train_util.py:1614
INFO 0 reg images. train_util.py:1617
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1622
INFO [Dataset 0] config_util.py:565
batch_size: 1
resolution: (512, 512)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: False
[Subset 0 of Dataset 0]
image_dir:
"D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\train\10_people\1_zk
z"
image_count: 169
num_repeats: 1
shuffle_caption: True
keep_tokens: 0
keep_tokens_separator:
secondary_separator: None
enable_wildcard: False
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: zkz
caption_extension: .txt
INFO [Dataset 0] config_util.py:571
INFO loading image sizes. train_util.py:854
100%|█████████████████████████████████████████████████████████████████████████████| 169/169 [00:00<00:00, 11267.30it/s]
INFO make buckets train_util.py:860
INFO number of images (including repeats) / train_util.py:906
各bucketの画像枚数(繰り返し回数を含む)
INFO bucket 0: resolution (320, 704), count: 8 train_util.py:911
INFO bucket 1: resolution (320, 768), count: 1 train_util.py:911
INFO bucket 2: resolution (384, 640), count: 71 train_util.py:911
INFO bucket 3: resolution (448, 576), count: 68 train_util.py:911
INFO bucket 4: resolution (512, 512), count: 17 train_util.py:911
INFO bucket 5: resolution (576, 448), count: 2 train_util.py:911
INFO bucket 6: resolution (640, 384), count: 2 train_util.py:911
INFO mean ar error (without repeats): 0.043133719759432164 train_util.py:916
INFO preparing accelerator train_network.py:225
Traceback (most recent call last):
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\scripts\stable\train_network.py", line 1115, in <module>
trainer.train(args)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\scripts\stable\train_network.py", line 226, in train
accelerator = train_util.prepare_accelerator(args)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\scripts\stable\library\train_util.py", line 4307, in prepare_accelerator
accelerator = Accelerator(
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\accelerate\accelerator.py", line 383, in __init__
self.state = AcceleratorState(
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\accelerate\state.py", line 846, in __init__
PartialState(cpu, **kwargs)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\accelerate\state.py", line 270, in __init__
self.num_processes = torch.distributed.get_world_size()
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\distributed_c10d.py", line 1832, in get_world_size
return _get_group_size(group)
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\distributed_c10d.py", line 864, in _get_group_size
default_pg = _get_default_group()
File "D:\webui\lora-scripts-v1.10.0\lora-scripts-v1.10.0\python\lib\site-packages\torch\distributed\distributed_c10d.py", line 1025, in _get_default_group
raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.
请问这个应该怎么解决呢?
我好像找到了解决这个问题的答案,是因为我没有选择要进行训练的显卡,在我选择了之后,能正常的进行训练。但是,我有四张显卡,如果我都选择了的话,也还是会报错。就是,我不能同时选择4张显卡,我不知道这是为什么。只能使用一张显卡进行训练
我想找到解决这个问题的答案,是因为我没有选择要进行训练的显卡,在我选择了之后,才能正常的进行训练。但是,我有四张显卡,如果我都选择了的话,也还是会报错。就是,我不能同时选择4张显卡,我不知道这是为什么。用一张显卡进行训练
我遇到了同样的问题,请问在哪里选择显卡?
我想找到解决这个问题的答案,是因为我没有选择要进行训练的显卡,在我选择了之后,才能正常的进行训练。但是,我有四张显卡,如果我都选择了的话,也还是会报错。就是,我不能同时选择4张显卡,我不知道这是为什么。用一张显卡进行训练
我遇到了同样的问题,请问在哪里选择显卡?
在专家训练最下面,会有一个选择显卡的下拉框(如果你是多显卡的话,会出现)。后续我有一个单显卡的服务器搭建,没有出现这个选择框