OneTrainer [Feat]: FLUX-1 Support

Describe your use-case.

FLUX-1 Support

What would you like to see as a solution?

FLUX-1 Support

Have you considered alternatives? List them here.

I could go fuck myself I guess

Aug 04 '24 05:08 Tophness

Amazing alternative lol.

Aug 05 '24 21:08 silverace71

my biggest hope is from onetrainer for low vram FLUX-1

SDXL is lowest vram trained on OneTrainer so many people are waiting atm

Aug 05 '24 21:08 FurkanGozukara

I guess full fine tuning will be harder to implement, but lora support would already be amazing.

Aug 05 '24 21:08 etha302

that would be nice to have Flux in OT.... seems that FLUX is the best open source model, way over than what we have now.

Aug 06 '24 09:08 djp3k05

Lora training (even not a low VRAM, fp8-quanto with 23.5 also good enough) for windows users would be amazing (SimpleTuner not working even on WSL, only rented linux env)

Aug 06 '24 10:08 Desm0nt

I think that onetrainer might finally need to implement multigpu for full flux support, because anything over rank 16 lora will probably be untrainable with the 24gb that consumer gpus typically have. which is probably actually a good thing. but then again, with how great a model flux is, we might be pushing rank 4 loras for simple stuff cause you dont need to finetune much, its probably already good at what you want.

Aug 06 '24 16:08 yggdrasil75

anything over rank 16 lora will probably be untrainable with the 24gb that consumer gpus typically have.

On fp8, I can either train rank 2 batch 1 or rank 1 batch 2. On quanto-int4, I can manage rank 2 batch 2 I think (though, AFAIK, Flux quantized to int4 is not available for inference on Comfy, only in Python code). Flux LoRa training capability on a 24gb card is rather limited and I've yet to have any successful results after doing an LR sweep.

Aug 06 '24 17:08 ejektaflex

Flux LoRa training capability on a 24gb card is rather limited

I run rank-16 batch 1 gradient accumulation 3 on single 4090 in 1024*1024 in fp8-quanto in an attempt to train Dora. So, it can run rank 16. But the results are... highly questionable. It took 15 hours on 700 images (on which I had previously successfully train Style Lora for sd1.5, sdxl and pixart sigma) and with a high LR of 2e-5 it is both very undertrained and overcoocked. With a lower LR it will take even longer to train, with a higher LR it will forget everything it knew before.

Aug 07 '24 04:08 Desm0nt

It took 15 hours on 700 images. I had 11 and it took me 45-75mins. Geez!

Aug 11 '24 15:08 VeteranXT

i like to fine tune models on a handful of datasets (100-3000 images each). would be really nice to have SDXL-like support for that in OT (because it works so well!)

Aug 11 '24 21:08 stealurfaces

My question how did you get 1000? Let alone 3000?

Aug 11 '24 22:08 VeteranXT

Supposedly NF4 gets up to 4x speedup, less vram for even higher precision and is now the recommended format. Seems like this might actually be doable now?

https://civitai.com/models/638572/nf4-flux1 https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/981

Aug 12 '24 09:08 Tophness

Supposedly NF4 gets up to 4x speedup, less vram for even higher precision and is now the recommended format. Seems like this might actually be doable now?

https://civitai.com/models/638572/nf4-flux1 lllyasviel/stable-diffusion-webui-forge#981

NF4 might be good for inference but may not be the optimal format for training. In general, training should under a more precise data type so that the model could learn things well.

Aug 15 '24 02:08 ultraman-blazar

There's a guide here by the SimpleTuner devs for creating FLUX Loras. Edit: They recommend 512px training over 1024px

Aug 15 '24 02:08 dathide

is Flux training coming to OneTrainer? It seems to work great in SimpleTuner

Aug 15 '24 23:08 driqeks

It would be great if could train with 12GB VRAM

Aug 16 '24 00:08 Black911X

There's a guide here by the SimpleTuner devs for creating FLUX Loras. Edit: They recommend 512px training over 1024px

It would be great if could train with 12GB VRAM

They have this running great on 12GB and 16GB on kohya's. Simpletuner recommends A100s and people seem to be saying their implementation is not very efficient. https://github.com/kohya-ss/sd-scripts/issues/1445#issuecomment-2291817718

Aug 16 '24 04:08 Tophness

My question how did you get 1000? Let alone 3000?

flux lora dataset, like any lora dataset is whatever you want. you can pull all images from a subreddit you like, download an artists entire artstation, rip every frame from a video, whatever. then you just use something like taggui to generate captions or tags for the images (if you didnt pull from a booru, this is useful) and you have 3000+ images. currently I have ~3400 images from roughly 100 different artists for 40 different characters from 2 different games. when I downloaded them from a booru I got 11k images and deleted the majority because they were poorly drawn, not accurate to the character (gender swaps, race swaps, and even species swaps, like "catlike girl as an actual cat" type stuff), or simply not something I am interested in training on (comics mostly), or stuff I am not interested in at all.

if you want a tool to help, look at gallery-dl. if you are pulling from a booru, I would recommend putting something to separate the description from the tags in your conf.

Aug 24 '24 14:08 yggdrasil75

Any news on this? Just read about the Flux branch and checked it, but found no other info. Any way one can help / provide input via testing?

Aug 25 '24 17:08 gilga2024

News will be provided when things are ready.

Aug 25 '24 20:08 mx

It seems like the flux feature is now available in master. At least for LoRA and DoRA. May I kindly ask if there is any documentation / update on how to use it? I have not found any information on this so far.

Sep 03 '24 18:09 gilga2024

Trained a test LoRA and it appears that the key names in the produced safetensors do not match what is being used for most LoRAs that can be loaded into WebUI Forge.

For example, OneTrainer: lora_transformer_single_transformer_blocks_10_attn_to_k.lora_up.weight

And other models (ie. via SimpleTrainer): transformer.single_transformer_blocks.10.attn.to_k.lora_A.weight

Sep 04 '24 09:09 master131

Sounds like it would need some renaming:

https://huggingface.co/comfyanonymous/flux_RealismLora_converted_comfyui/blob/main/convert.py

Sep 04 '24 09:09 MNeMoNiCuZ

That one sounds like a simple tuner issue. The OneTrainer names are the same for all models. They follow the naming convention set by kohya during the SD1.5 days. Changing that doesn't make any sense.

Sep 04 '24 09:09 Nerogar

I just checked, it looks like the key renaming is implemented in latest ComfyUI, but not backported to Forge yet.

ComfyUI implementation: https://github.com/comfyanonymous/ComfyUI/blob/f067ad15d139d6e07e44801759f7ccdd9985c636/comfy/lora.py#L327

Forge implementation: https://github.com/lllyasviel/stable-diffusion-webui-forge/blob/668e87f920be30001bb87214d9001bf59f2da275/packages_3rdparty/comfyui_lora_collection/lora.py#L318

I manually patched this file as per below in Forge and all working now, happy days:

            if k.endswith(".weight"):
                to = diffusers_keys[k]
                key_map["transformer.{}".format(k[:-len(".weight")])] = to #simpletrainer and probably regular diffusers flux lora format
                key_map["lycoris_{}".format(k[:-len(".weight")].replace(".", "_"))] = to #simpletrainer lycoris
                key_map["lora_transformer_{}".format(k[:-len(".weight")].replace(".", "_"))] = to #onetrainer

Sep 04 '24 10:09 master131

It seems like the flux feature is now available in master. At least for LoRA and DoRA. May I kindly ask if there is any documentation / update on how to use it? I have not found any information on this so far.

Do not need an answer on this anymore. Someone posted a guide on reddit.

Sep 05 '24 18:09 gilga2024

I think that onetrainer might finally need to implement multigpu for full flux support, because anything over rank 16 lora will probably be untrainable with the 24gb that consumer gpus typically have. which is probably actually a good thing. but then again, with how great a model flux is, we might be pushing rank 4 loras for simple stuff cause you dont need to finetune much, its probably already good at what you want.

up to rank 64 is possible in 1024x1024 with ai toolkit

Sep 09 '24 05:09 protector131090

I think that onetrainer might finally need to implement multigpu for full flux support, because anything over rank 16 lora will probably be untrainable with the 24gb that consumer gpus typically have. which is probably actually a good thing. but then again, with how great a model flux is, we might be pushing rank 4 loras for simple stuff cause you dont need to finetune much, its probably already good at what you want.

up to rank 64 is possible in 1024x1024 with ai toolkit

128 was possible on my rtx4080 16 GB, although I heard 16-32 was enough for flux and better for details like skin and from my testing so far that does seem to be the case, hard to tell though. I suspect FurkanGozukara will have more conclusive results. I've never been able to fit it entirely in my vram with kohya, so I stopped trying. Best and fastest results so far are adamw8bit / rank 16 / train_t5xxl / split_qkv / loraplus_unet_lr_ratio=4, which is designed for 24GB only. 8GB is spilling into my shared vram, but it's already learned in 2 days what it took 3 weeks to get to on recommended adafactor settings for 16GB cards, so I think it's gonna converge by tomorrow what took a month previously.

Sep 09 '24 06:09 Tophness

Any status on Flux support? Will there be some kind of announcement or update to the main page when Flux support is "stable"?

Cheers

Sep 23 '24 20:09 MNeMoNiCuZ

Flux support is now available on the master branch. You can utilise ram offloading (we advise 64GB) to further enable this. There are however still a few limitations:

Only LoRA training works, fine tuning is not supported yet.
The only quantization scheme available is NF4. Though FP8 training for Lora’s is in beta on the branch fp8
Only the diffusers format is supported as a base model.

Nov 10 '24 01:11 O-J1