stable-diffusion icon indicating copy to clipboard operation
stable-diffusion copied to clipboard

Synthetic data generation for model training

Open zepmck opened this issue 2 years ago • 7 comments

Hi all, congratulations for your work. I have been playing with diffusion models for images only generation and results are great. But what if I would use the same model to generate images annotation too? My goal is to use stable diffusion to generate a complete training data set (images + annotations). Does anyone have any suggestions about this? Thanks!

zepmck avatar Oct 03 '22 14:10 zepmck

Hi zepmck, curious what does the annotations look like?

taoisu avatar Oct 03 '22 18:10 taoisu

Hi zepmck, curious what does the annotations look like?

I mean labels, bounding boxes, semantic masks.

zepmck avatar Oct 04 '22 08:10 zepmck

What's your use case? To train something smaller or more lean, on a CNN? You will have to run it through one.

dagelf avatar Oct 06 '22 19:10 dagelf

I think some of the recent semantic segmentation work might be similar to what you're trying to achieve. Basically, you form the problem as an image-to-image translation problem, raw image in, the image with labels out.

But if you're thinking of generating both image and labels from noises, then it's a different story.

taoisu avatar Oct 06 '22 19:10 taoisu

I think some of the recent semantic segmentation work might be similar to what you're trying to achieve. Basically, you form the problem as an image-to-image translation problem, raw image in, the image with labels out.

How would you take labels out from generated images then?

zepmck avatar Oct 07 '22 12:10 zepmck

Just throwing some ideas. For segmentation mark you can use colors, for bboxes you may instruct the bbox to have specific color and use similar mechanism. There is one other option, however, is to formulate the generation output to be tokens like DiffusionLM, you may check that out.

taoisu avatar Oct 07 '22 23:10 taoisu

@zepmck - I'm also interested in using Stable Diffusion to generate training data, so I'm curious how far you've gotten down this path. I work for an environmental non-profit with camera trap data, which suffer from long-tailed distribution (i.e., a lot of the animals you want to identify the most are the rarest and thus ones you have the fewest images of), and the classifiers trained on camera trap data tend to generalize poorly, because they are fixed cameras and the models learn too much about the specific backgrounds of the images they're trained on.

All that is to say, I'm wondering if Stable Diffusion could help generate images with the look & feel of camera trap images, perhaps using backgrounds from real camera trap locations, with animals in them for which we have few real-world examples. The automated label generation would be a bonus but not 100% necessary.

Do you have any thoughts on how I could get started with this?

nathanielrindlaub avatar Oct 14 '22 20:10 nathanielrindlaub

I am thinking on the similar idea. Taking multiple text description as input, and ouput an image with bbox/mask.

A paper discussed the potential usage of generated data: IS SYNTHETIC DATA FROM GENERATIVE MODELS READY FOR IMAGE RECOGNITION ?

zilunzhang avatar Nov 01 '22 08:11 zilunzhang

This can be relevant here: https://github.com/castorini/daam

filipeferreiradsr avatar Nov 08 '22 16:11 filipeferreiradsr