CLEAR Synthetic Dataset

Hi, the synthetic dataset-10K is very suitable for flux distillation. Would you mind sharing how to select/get the text prompts in the dataset? Thanks.

Dec 25 '24 02:12 A-BigBao

Hi,

We directly use the data and text prompts released here for training. According to the owner, the text prompts are generated by GPT-4o.

We did not specifically study how to properly select text prompts. If you explore a better solution, welcome to raise issues or pr. :)

Dec 25 '24 05:12 Huage001

These synthetic pairs are very useful for the community. It would be much better if the GPT-4o prompt could also be released. Besides, I guess the reason(not for sure) for the better results on synthetic data is that: Flux.1-dev is a guidance distilled model, but for real data we can not get the guidance strength for each image.

Jan 03 '25 11:01 A-BigBao

These synthetic pairs are very useful for the community. It would be much better if the GPT-4o prompt could also be released. Besides, I guess the reason(not for sure) for the better results on synthetic data is that: Flux.1-dev is a guidance distilled model, but for real data we can not get the guidance strength for each image.

Does it mean that the distribution of images generated by the dev model is more in line with the guidance intensity set during dev distillation, so clear is easier to learn?

Feb 13 '25 08:02 Feynman1999