stable-diffusion Synthetic data generation for model training

Hi all, congratulations for your work. I have been playing with diffusion models for images only generation and results are great. But what if I would use the same model to generate images annotation too? My goal is to use stable diffusion to generate a complete training data set (images + annotations). Does anyone have any suggestions about this? Thanks!

Oct 03 '22 14:10 zepmck

Hi zepmck, curious what does the annotations look like?

Oct 03 '22 18:10 taoisu

Hi zepmck, curious what does the annotations look like?

I mean labels, bounding boxes, semantic masks.

Oct 04 '22 08:10 zepmck

What's your use case? To train something smaller or more lean, on a CNN? You will have to run it through one.

Oct 06 '22 19:10 dagelf

I think some of the recent semantic segmentation work might be similar to what you're trying to achieve. Basically, you form the problem as an image-to-image translation problem, raw image in, the image with labels out.

But if you're thinking of generating both image and labels from noises, then it's a different story.

Oct 06 '22 19:10 taoisu

I think some of the recent semantic segmentation work might be similar to what you're trying to achieve. Basically, you form the problem as an image-to-image translation problem, raw image in, the image with labels out.

How would you take labels out from generated images then?

Oct 07 '22 12:10 zepmck

Just throwing some ideas. For segmentation mark you can use colors, for bboxes you may instruct the bbox to have specific color and use similar mechanism. There is one other option, however, is to formulate the generation output to be tokens like DiffusionLM, you may check that out.

Oct 07 '22 23:10 taoisu

@zepmck - I'm also interested in using Stable Diffusion to generate training data, so I'm curious how far you've gotten down this path. I work for an environmental non-profit with camera trap data, which suffer from long-tailed distribution (i.e., a lot of the animals you want to identify the most are the rarest and thus ones you have the fewest images of), and the classifiers trained on camera trap data tend to generalize poorly, because they are fixed cameras and the models learn too much about the specific backgrounds of the images they're trained on.

All that is to say, I'm wondering if Stable Diffusion could help generate images with the look & feel of camera trap images, perhaps using backgrounds from real camera trap locations, with animals in them for which we have few real-world examples. The automated label generation would be a bonus but not 100% necessary.

Do you have any thoughts on how I could get started with this?

Oct 14 '22 20:10 nathanielrindlaub

I am thinking on the similar idea. Taking multiple text description as input, and ouput an image with bbox/mask.

A paper discussed the potential usage of generated data: IS SYNTHETIC DATA FROM GENERATIVE MODELS READY FOR IMAGE RECOGNITION ?

Nov 01 '22 08:11 zilunzhang

This can be relevant here: https://github.com/castorini/daam

Nov 08 '22 16:11 filipeferreiradsr

stable-diffusion stable-diffusion copied to clipboard

Synthetic data generation for model training

stable-diffusion
stable-diffusion copied to clipboard