sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Support FLUX series models

Open ddpasa opened this issue 1 year ago • 70 comments

These models have just been released and appear to be amazing. Links below:

Blog from fal.ai: https://blog.fal.ai/flux-the-largest-open-sourced-text2img-model-now-available-on-fal/

Huggingface: https://huggingface.co/black-forest-labs

There is a schell version and dev version.

ddpasa avatar Aug 02 '24 07:08 ddpasa

very agree!

Oliverkuien avatar Aug 02 '24 15:08 Oliverkuien

is it possible to finetune the model on a 3090 or do we have to do a lora due to the size?

LazyCat420 avatar Aug 02 '24 19:08 LazyCat420

I'm wondering if image gen models would benefit from the sophisticated quantization methods that are popular in the LLM space, like GGUF. Any ongoing research in this area?

Apparently some folks have trained LoRAs on quantized LLMs to good effect, e.g. https://old.reddit.com/r/LocalLLaMA/comments/13q8zjc/how_much_why_does_quantization_negatively_affect/

ThereforeGames avatar Aug 03 '24 01:08 ThereforeGames

I totally agree. Since SD3 may not be able to fit a slightly larger dataset due to model problems (scripts include SimpleTuner, SD-scripts, OneTrainer), it is recommended to stop developing SD3 training scripts. I did a simple test on Flux-dev, and its capabilities are completely superior to SD3. Here are some examples: zyi7Khvc7wnYcFh64fENU DFVgNwtZxrLxJGkulrcN0 8ihTzsxMXqggjHJyC-N8k fWPcyz5JIePCHseGBZqLj It’s worth pointing out that this is the first model I’ve seen that can correctly draw the position of the umbrella handle and the umbrella cover, and the text prompt on the road sign is “iiilllllbddbwW”. Although the AI ​​didn’t draw it correctly, I haven’t seen any model that can draw it correctly either.

leonary avatar Aug 03 '24 10:08 leonary

I totally agree. Since SD3 may not be able to fit a slightly larger dataset due to model problems (scripts include SimpleTuner, SD-scripts, OneTrainer), it is recommended to stop developing SD3 training scripts. I did a simple test on Flux-dev, and its capabilities are completely superior to SD3. Here are some examples:

I strongly disagree. While the SD3 Medium model has certain drawbacks, it possesses a crucial advantage that FLUX lacks: its weights are publicly available. In contrast, FLUX only provides access to the base model's weights through an API, with no indication or information suggesting they plan to make it open-source. The models that are publicly accessible are derived through distillation of the base model; they are truncated, incomplete, and practically unsuitable for further training. It only makes sense to train the model we weren't given, as fine-tuning the distilled models would require roughly the same effort as training from scratch, if not more. Even the SDXL model was superior in this regard.

Calling it open-source is akin to labeling GPT-4o as open-source simply because we were given GPT-3 weights and the ability to fine-tune it. I'm concerned that we'll be wasting time that could be better spent studying SD3, debugging and optimizing its training script. SD3 has more potential, and Stability AI has promised to eventually release all models, including their weights, as open-source. This makes SD3 a more promising avenue for our efforts

dill-shower avatar Aug 04 '24 20:08 dill-shower

I strongly disagree. While the SD3 Medium model has certain drawbacks, it possesses a crucial advantage that FLUX lacks: its weights are publicly available. In contrast, FLUX only provides access to the base model's weights through an API, with no indication or information suggesting they plan to make it open-source. The models that are publicly accessible are derived through distillation of the base model; they are truncated, incomplete, and practically unsuitable for further training. It only makes sense to train the model we weren't given, as fine-tuning the distilled models would require roughly the same effort as training from scratch, if not more. Even the SDXL model was superior in this regard.

Hello, the weights for the Flux series models have been released, including the dev version and the schnell version. The weights for the Pro version have not been released and can only be accessed via API, but the performance gap between the dev version and the Pro version is not significant, and both should have surpassed SD3. You can find their weights here: flux_dev flux_schnell Diffusers have initial support for LoRA training with Flux, which you can find here: diffusers SimpleTuner has initial compatibility with Flux's LoRA training in their scripts, which you can find here: SimpleTuner ComfyUI now supports Flux and its initial LoRA, which you can find here: ComfyUI

leonary avatar Aug 05 '24 05:08 leonary

Hello, the weights for the Flux series models have been released, including the dev version and the schnell version.

Please read this https://blog.fal.ai/flux-the-largest-open-sourced-text2img-model-now-available-on-fal/ Dev and schnell are obtained by distillation of pro scales. It is possible to create LoRAs for them, they will work. But full model training is practically impossible because of this

dill-shower avatar Aug 05 '24 11:08 dill-shower

Hello, the weights for the Flux series models have been released, including the dev version and the schnell version.

Please read this https://blog.fal.ai/flux-the-largest-open-sourced-text2img-model-now-available-on-fal/ Dev and schnell are obtained by distillation of pro scales. It is possible to create LoRAs for them, they will work. But full model training is practically impossible because of this

It should be possible to fine-tune distilled models.

ddpasa avatar Aug 05 '24 11:08 ddpasa

> It should be possible to fine-tune distilled models.

Why should it? I just did a quick search for information about training SDXL Turbo, and it turns out it was also obtained through distillation from the base model. There are tons of such models on Civitai, but they're all created by merging SDXL Turbo with something else. I couldn't find a single one obtained through fine-tuning.The only relevant post I came across was a complaint on Reddit about how training SDXL Turbo produces very poor results. As I expected. https://www.reddit.com/r/StableDiffusion/comments/18l2qp0/sdxl_turbo_fine_tunemerging/

dill-shower avatar Aug 05 '24 11:08 dill-shower

It should be possible to fine-tune distilled models.

Why should it? I just did a quick search for information about training SDXL Turbo, and it turns out it was also obtained through distillation from the base model. There are tons of such models on Civitai, but they're all created by merging SDXL Turbo with something else. I couldn't find a single one obtained through fine-tuning.The only relevant post I came across was a complaint on Reddit about how training SDXL Turbo produces very poor results. As I expected. https://www.reddit.com/r/StableDiffusion/comments/18l2qp0/sdxl_turbo_fine_tunemerging/

That is because the training code for Turbo was never released and nobody wrote one. It's not fundamentally impossible.

ddpasa avatar Aug 05 '24 11:08 ddpasa

even training schnell with lora or full tune is fine. they're just big models and require the use of LoRA with quantised base weights, but Kohya should probably wait for the bugs to be worked out in Quanto first before going ahead and trying to integrate it. it makes a mess of the model state dict keys.

bghira avatar Aug 07 '24 07:08 bghira

@kohya-ss Training scripts released : https://github.com/XLabs-AI/x-flux

BenDes21 avatar Aug 07 '24 16:08 BenDes21

those are pretty minimal and eg. it doesn't implement cosmap/logit-norm or any of the SD3 training details, just about the same as cloneofsimo/minRF implementation. in fact it's basically identical - the interesting thing there is probably their ControlNet training implementation details

bghira avatar Aug 07 '24 17:08 bghira

diffusers scripts arrived

https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md

FurkanGozukara avatar Aug 09 '24 13:08 FurkanGozukara

diffusers scripts arrived

https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md

@FurkanGozukara , you are amazing as usual!

ddpasa avatar Aug 09 '24 13:08 ddpasa

@ddpasa thanks

pull request arrived already :d

https://github.com/kohya-ss/sd-scripts/pull/1374/files/da4d0fe0165b3e0143c237de8cf307d53a9de45a..36b2e6fc288c57f496a061e4d638f5641c32c9ea

FurkanGozukara avatar Aug 09 '24 14:08 FurkanGozukara

It should be possible to fine-tune distilled models.

Why should it? I just did a quick search for information about training SDXL Turbo, and it turns out it was also obtained through distillation from the base model. There are tons of such models on Civitai, but they're all created by merging SDXL Turbo with something else. I couldn't find a single one obtained through fine-tuning.The only relevant post I came across was a complaint on Reddit about how training SDXL Turbo produces very poor results. As I expected. https://www.reddit.com/r/StableDiffusion/comments/18l2qp0/sdxl_turbo_fine_tunemerging/

Do you guys also love it when someone is so confidently incorrect?

My flux finetune is coming in very nicely. Huge upgrade compared to SDXL and Pony, also way more trainable than SD3 Medium. It's literally impossible to add NSFW to SD3 medium because of the complete lack of NSFW content in its training data. No finetuner is going to finish SAI's pathetic job. Nobody is ever going to create any kind of content for SD3 when you can create better results for the same money with flux. so yeah, rip.

Flux seems to have seen plenty of NSFW images, and it's just filtered and dropped out via captioning. So the context and knowledge already exists in the latent space, and it only needs to... well get finetuned.

So, yeah f*ck SD3. Pyro's NSFW model goes FLUX.

cyan2k avatar Aug 10 '24 22:08 cyan2k

It should be possible to fine-tune distilled models.

Why should it? I just did a quick search for information about training SDXL Turbo, and it turns out it was also obtained through distillation from the base model. There are tons of such models on Civitai, but they're all created by merging SDXL Turbo with something else. I couldn't find a single one obtained through fine-tuning.The only relevant post I came across was a complaint on Reddit about how training SDXL Turbo produces very poor results. As I expected. https://www.reddit.com/r/StableDiffusion/comments/18l2qp0/sdxl_turbo_fine_tunemerging/

Do you guys also love it when someone is so confidently incorrect?

My flux finetune is coming in very nicely. Huge upgrade compared to SDXL and Pony, also way more trainable than SD3 Medium. It's literally impossible to add NSFW to SD3 medium because of the complete lack of NSFW content in its training data. No finetuner is going to finish SAI's pathetic job. Nobody is ever going to create any kind of content for SD3 when you can create better results for the same money with flux. so yeah, rip.

Flux seems to have seen plenty of NSFW images, and it's just filtered and dropped out via captioning. So the context and knowledge already exists in the latent space, and it only needs to... well get finetuned.

So, yeah f*ck SD3. Pyro's NSFW model goes FLUX.

what are you talking about? i trained 3.0 for 30 minutes and it can generate nsfw just fine. NSFW link https://imgur.com/a/sd-30-test-G7G7G6u

protector131090 avatar Aug 11 '24 09:08 protector131090

It should be possible to fine-tune distilled models.

Why should it? I just did a quick search for information about training SDXL Turbo, and it turns out it was also obtained through distillation from the base model. There are tons of such models on Civitai, but they're all created by merging SDXL Turbo with something else. I couldn't find a single one obtained through fine-tuning.The only relevant post I came across was a complaint on Reddit about how training SDXL Turbo produces very poor results. As I expected. https://www.reddit.com/r/StableDiffusion/comments/18l2qp0/sdxl_turbo_fine_tunemerging/

Do you guys also love it when someone is so confidently incorrect? My flux finetune is coming in very nicely. Huge upgrade compared to SDXL and Pony, also way more trainable than SD3 Medium. It's literally impossible to add NSFW to SD3 medium because of the complete lack of NSFW content in its training data. No finetuner is going to finish SAI's pathetic job. Nobody is ever going to create any kind of content for SD3 when you can create better results for the same money with flux. so yeah, rip. Flux seems to have seen plenty of NSFW images, and it's just filtered and dropped out via captioning. So the context and knowledge already exists in the latent space, and it only needs to... well get finetuned. So, yeah f*ck SD3. Pyro's NSFW model goes FLUX.

what are you talking about? i trained 3.0 for 30 minutes and it can generate nsfw just fine. NSFW link https://imgur.com/a/sd-30-test-G7G7G6u

Someone just reads too much reddit and similar places where everyone is convinced that if a model wasn't trained on nsfw then they will never be able to create such things. How they used to create models for anime, furry and the rest for sdxl no one knows. Lost technology

In all seriousness, there's nothing stopping sd3 from learning to create any nsfw content and even worse. Due to the more efficient architecture, training does not require as much GPU overhead as sdxl.

I don't understand why everyone is so crazy with this FLUX and minus my comment that it has no scales and access only by api

dill-shower avatar Aug 11 '24 15:08 dill-shower

Someone just reads too much reddit and similar places where everyone is convinced that if a model wasn't trained on nsfw then they will never be able to create such things.

We (group of SDXL finetuners) spend like 5k bucks making NSFW in SD3 work, but a model that can't even render women lying in grass is so lobotomized that re-introducing NSFW takes immense ressources, as in the ballpark of SAI's training infrastructure. No hobby finetuner is going to pay for that. Nobody is going to pay for that, if they can get way better results for a fraction of the cost with FLUX.

It's not hard to understand. It took 20 bucks to teach FLUX NSFW concepts.... 5k$ vs 20$ pretty clear cut.

How they used to create models for anime, furry and the rest for sdxl no one knows. Lost technology

Well it seems that you don't know the basic of how training such models work and how self-organisation of embeddings in the latent space works. LAION, the data corpus of SDXL, is full of furry and anime shit. SD3 data corpus has exactly 0 NSFW images in it. And you honestly have difficulties to understand why one is trainable and the later isn't? You're on the wrong board then.

Please stop talking about things you don't have a clue about.

Also FLUX is runnable locally and the weights are public, so I don't even know what " it has no scales and access only by api" even means.

cyan2k avatar Aug 11 '24 20:08 cyan2k

We (group of SDXL finetuners) spend like 5k bucks making NSFW in SD3 work, but a model that can't even render women lying in grass is so lobotomized

Stabilityai promised to release the 3.1 model soon. They promised to fix this problem in it. You've been too quick to educate yourself

Well it seems that you don't know the basic of how training such models work and how self-organisation of embeddings in the latent space works. LAION, the data corpus of SDXL, is full of furry and anime shit

When sdxl came out it was written about on reddit the same thing they are now writing about sd3. That it didn't use NSFW content in training so nsfw training is impossible, "it's a terrible model, stabilityai killed their reputation by refusing to train on nsfw content, we can't use it, we stay on sd1.5".... Just like they wrote about sd2.... Let's wait a year and find out that there was nsfw in the sd3 dataset but it was removed from the sd4 dataset so we stay on sd3 and boycott the new model....

Also FLUX is runnable locally and the weights are public

Please give me link to download Flux-pro model

dill-shower avatar Aug 11 '24 23:08 dill-shower

People are using simple tuner for flux lora creation. Unfortunately it has no windows support. Waiting for kohya ss :) . Flux dev is so much better than sd3 💯

D3voz avatar Aug 12 '24 05:08 D3voz

People are using simple tuner for flux lora creation. Unfortunately it has no windows support. Waiting for kohya ss :) . Flux dev is so much better than sd3 💯

Now sd3 branch supports FLUX.1 dev LoRA training experimentally :) https://github.com/kohya-ss/sd-scripts/tree/sd3

kohya-ss avatar Aug 12 '24 08:08 kohya-ss

Stabilityai promised to release the 3.1 model soon. They promised to fix this problem in it. You've been too quick to educate yourself

If SD3.1 could achieve the performance of Flux Dev while allowing training and sharing, and if the machine costs required for fine-tuning are lower than those of Flux Dev, I would be very willing to use SD3.1. However, given the performance of SD3 8b and the licensing of the SD3 series, I am pessimistic about this possibility.

leonary avatar Aug 12 '24 09:08 leonary

Now sd3 branch supports FLUX.1 dev LoRA training experimentally :) https://github.com/kohya-ss/sd-scripts/tree/sd3

Thank you for your excellent work. The fine-tuning effect of sd-scripts with Flux has completely met my expectations, and its performance is on par with Simple Tuner.

Additionally, is there any plan to support Flux in some of the LoRA processing scripts? These scripts could help the community more quickly develop models like "detail enhancer."

leonary avatar Aug 12 '24 09:08 leonary

People are using simple tuner for flux lora creation. Unfortunately it has no windows support. Waiting for kohya ss :) . Flux dev is so much better than sd3 💯

Now sd3 branch supports FLUX.1 dev LoRA training experimentally :) https://github.com/kohya-ss/sd-scripts/tree/sd3

Will this work for the NF4 model that was released yesterday? Up to 4x speedups, reduced vram, increased quality.

https://civitai.com/models/638572/nf4-flux1 https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/981

Tophness avatar Aug 12 '24 18:08 Tophness

you don't need an A100 for flux. imo kohya should release sooner than keep trying to add the million features. you can train on 16G VRAM without any quantisation at all.

bghira avatar Aug 12 '24 18:08 bghira

you don't need an A100 for flux. imo kohya should release sooner than keep trying to add the million features. you can train on 16G VRAM without any quantisation at all.

It did in other trainers such as your own, but yeah apparently not anymore. image

The NF4 model is far superior though and more accessible for inference. FP8 used to be virtually unusable on my 4080 because it'd take about 5-10 mins for 1 overquantized generation since it overloads my shared memory, and now it's <1 min for outputs that look on par with Pro. Don't really wanna waste a week training an FP8 model that's already obsolete and can't be used by most people.

Tophness avatar Aug 12 '24 18:08 Tophness

it's not like that at all though. fp8 is fine, especially in pytorch 2.4. you can read back through the comments in this issue to see.

bghira avatar Aug 12 '24 18:08 bghira

also, NF4 is definitely not "on par with Pro" 🤪

bghira avatar Aug 12 '24 18:08 bghira