CLEAR icon indicating copy to clipboard operation
CLEAR copied to clipboard

Synthetic Dataset

Open A-BigBao opened this issue 1 year ago • 3 comments

Hi, the synthetic dataset-10K is very suitable for flux distillation. Would you mind sharing how to select/get the text prompts in the dataset? Thanks.

A-BigBao avatar Dec 25 '24 02:12 A-BigBao

Hi,

We directly use the data and text prompts released here for training. According to the owner, the text prompts are generated by GPT-4o.

We did not specifically study how to properly select text prompts. If you explore a better solution, welcome to raise issues or pr. :)

Huage001 avatar Dec 25 '24 05:12 Huage001

These synthetic pairs are very useful for the community. It would be much better if the GPT-4o prompt could also be released. Besides, I guess the reason(not for sure) for the better results on synthetic data is that: Flux.1-dev is a guidance distilled model, but for real data we can not get the guidance strength for each image.

A-BigBao avatar Jan 03 '25 11:01 A-BigBao

These synthetic pairs are very useful for the community. It would be much better if the GPT-4o prompt could also be released. Besides, I guess the reason(not for sure) for the better results on synthetic data is that: Flux.1-dev is a guidance distilled model, but for real data we can not get the guidance strength for each image.

Does it mean that the distribution of images generated by the dev model is more in line with the guidance intensity set during dev distillation, so clear is easier to learn?

Feynman1999 avatar Feb 13 '25 08:02 Feynman1999