LLM-groundedDiffusion icon indicating copy to clipboard operation
LLM-groundedDiffusion copied to clipboard

How to get the image a man rides a horse?

Open hujunchao opened this issue 1 year ago • 4 comments

I try this project. It's amzing and interesting. But now, I meet a question. It's hard for me to get a good image by the text "a man rides a horse". Can you give me some advice? Thank you!

hujunchao avatar Jun 19 '23 13:06 hujunchao

Some initial attempts (you can improve by trying more options and seeds)

image

image

image

You may wonder why the man's face is weird. This is a known artifact of stable diffusion on small objects that is out of our scope to fix. Generating a man with a larger proportion of face to image may help.

TonyLianLong avatar Jun 19 '23 19:06 TonyLianLong

Thank you for your reply!

hujunchao avatar Jun 20 '23 13:06 hujunchao

When two objects do not interact, it is easy to use layout to get perfect image. But when two objects interact, it may be hard to use layout to get good image. How to show the action between objects? For example, a man and a horse may be easy. A man rides a horse may be difficult. A man is chasing a horse may be more difficult.

hujunchao avatar Jun 20 '23 13:06 hujunchao

Good question! This is why the space allows specifying a prompt for overall generation. Without it, you use a default prompt and don't get object interaction (SD will try to guess the object interaction, so it could also guess a man standing close to a horse on the specified location). With it, you get the object interaction (e.g., a man riding the horse, then SD knows the man is supposed to ride the horse, as shown in the generation above).

image

However, adding more fine-grained control to object interactions is a very useful future direction. This paper specifies the idea of "text->intermediate representation->image". You are encouraged to extend to more representations (e.g., scene graph or LLM-generated SVG that captures more information).

Examples: Same config, overall prompt: A man standing nearby a horse (I didn't play around the hyperparam) image

Same config, overall prompt: A man riding a horse image

TonyLianLong avatar Jun 20 '23 14:06 TonyLianLong