lora-scripts icon indicating copy to clipboard operation
lora-scripts copied to clipboard

无法针对flux-fill-dev训练lora

Open walterFeng opened this issue 9 months ago • 2 comments

  • 系统环境 Python 3.10.9 脚本版本:v1.12.0 ubuntu22.0.4

  • 问题描述: 使用了flux.1-dev训练lora运行正常,但当我把模型文件更换成models/unet/flux1-fill-dev.safetensors时,报错信息如下:

Traceback (most recent call last):
  File "/data/lora-scripts/./scripts/dev/flux_train_network.py", line 559, in <module>
    trainer.train(args)
  File "/data/lora-scripts/scripts/dev/train_network.py", line 571, in train
    model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
  File "/data/lora-scripts/./scripts/dev/flux_train_network.py", line 98, in load_target_model
    self.is_schnell, model = flux_utils.load_flow_model(
  File "/data/lora-scripts/scripts/dev/library/flux_utils.py", line 136, in load_flow_model
    info = model.load_state_dict(sd, strict=False, assign=True)
  File "/data/lora-scripts/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Flux:
	size mismatch for img_in.weight: copying a param with shape torch.Size([3072, 384]) from checkpoint, the shape in current model is torch.Size([3072, 64]).

查询资料发现修复次错误需要创建meta.jsonl配置文件,并设置conditioning_image_column = "conditioning_image",但我在GUI下并没有找到在哪里填写meta.jsonl的文件路径和增加conditioning_image_column配置的地方。 请问使用本GUI训练fill需要如何配置呢?

walterFeng avatar Apr 06 '25 18:04 walterFeng

请问老哥在训练的时候遇到过这个错误吗 多卡报错如下: ConnectionError: Tried to launch distributed communication on port 29500, but another process is utilizing it. Please specify a different port (such as using the --main_process_port flag or specifying a different main_process_port in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to 0. 换成单卡报错如下: NotImplementedError: Using RTX 4000 series doesn't support faster communication broadband via P2P or IB. Please set NCCL_P2P_DISABLE="1" and NCCL_IB_DISABLE="1" or use accelerate launch` which will do this automatically.

ZitengXue avatar Apr 19 '25 12:04 ZitengXue

哥,这个问题解决了吗

ack1234 avatar Jun 01 '25 08:06 ack1234