sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Have any guide for training flux Fill lora, or fine-turn flux Fill?

Open CaiNiaoCYY opened this issue 10 months ago • 6 comments

Have any guide for training flux Fill lora, or fine-turn flux Fill?

CaiNiaoCYY avatar Jan 21 '25 07:01 CaiNiaoCYY

It's not exactly an answer to your question, but I've noticed that when I use Flux Fill in ComfyUI, I can apply the LoRA I trained with sd-scripts to it, and it worked okay. I'm guessing Flux Fill is similar enough to Flux Dev that LoRAs generally work for both of them.

araleza avatar Jan 21 '25 13:01 araleza

这并非您问题的确切答案,但我注意到,当我在 ComfyUI 中使用 Flux Fill 时,我可以将使用 sd-scripts 训练的 LoRA 应用于它,并且效果很好。我猜 Flux Fill 与 Flux Dev 足够相似,LoRA 通常适用于它们两者。

thanks, i want train a lora only for improve text remove

CaiNiaoCYY avatar Jan 23 '25 09:01 CaiNiaoCYY

It's not exactly an answer to your question, but I've noticed that when I use Flux Fill in ComfyUI, I can apply the LoRA I trained with sd-scripts to it, and it worked okay. I'm guessing Flux Fill is similar enough to Flux Dev that LoRAs generally work for both of them.

are you sure? what kind of lora are you using, character/style or something else? flux fill is full finetune from dev and have a different input emb/input linear layer. is possible that dev lora somehow works for fill as well. but it's better if we develop a way to train lora with fill.

flankechen avatar Feb 15 '25 08:02 flankechen

I have the same question. Did you solve it? @CaiNiaoCYY

hnsywangxin avatar Jul 16 '25 12:07 hnsywangxin

我也有同样的疑问。你解决了吗?@CaiNiaoCYY

you can try flux kontext, i think it is batter than fill

CaiNiaoCYY avatar Jul 21 '25 03:07 CaiNiaoCYY

I saw this issue, and also want to understand difference between flux, flux fill and flux kontext model. So I dive into the code of diffusers.

A short answer is that flux, flux fill, flux kontext all have the same model structure, but there are differences in denoising process, which means the training code should be different.

structures of flux, flux fill, flux kontext are all using FluxTransformer2DModel: https://github.com/huggingface/diffusers/blob/3d2f8ae99b88c50a340dd879ec7f44981ffd12fe/src/diffusers/pipelines/flux/pipeline_flux.py#L159 https://github.com/huggingface/diffusers/blob/3d2f8ae99b88c50a340dd879ec7f44981ffd12fe/src/diffusers/pipelines/flux/pipeline_flux_fill.py#L179 https://github.com/huggingface/diffusers/blob/3d2f8ae99b88c50a340dd879ec7f44981ffd12fe/src/diffusers/pipelines/flux/pipeline_flux_kontext.py#L190

At inference time, fill model concat masked_image_latents into the input of transformer. Also, for kontext, it concat image_latents into latent. https://github.com/huggingface/diffusers/blob/3d2f8ae99b88c50a340dd879ec7f44981ffd12fe/src/diffusers/pipelines/flux/pipeline_flux.py#L919 https://github.com/huggingface/diffusers/blob/3d2f8ae99b88c50a340dd879ec7f44981ffd12fe/src/diffusers/pipelines/flux/pipeline_flux_fill.py#L984 https://github.com/huggingface/diffusers/blob/3d2f8ae99b88c50a340dd879ec7f44981ffd12fe/src/diffusers/pipelines/flux/pipeline_flux_kontext.py#L1060

CHR-ray avatar Jul 28 '25 03:07 CHR-ray