feed_forward_vqgan_clip icon indicating copy to clipboard operation
feed_forward_vqgan_clip copied to clipboard

How to get more variation in the null image

Open kchodorow opened this issue 1 year ago • 0 comments

I've been generating images using this model, which is delightfully fast, but I've noticed that it produces images that are all alike. I tried generating the "null" image by doing:

H = perceptor.encode_text(toks.to(device)).float()
z = net(0 * H)

This resulted in:

base image

And indeed, everything I generated kind of matched that: you can see the fleshly protrusion on the left in "gold coin":

gold-coin--0 0

The object and matching mini-object in "tent":

tent-0 5

And it always seems to try to caption the image with nonsense lettering ("lion"):

lion--0 0

So I'm wondering if there's a way to "prime" the model and suggest it use a different zero image for each run. Is there a variable I can set, or is this deeply ingrained in training data?

Any advice would be appreciated, thank you!

(Apologies if this is the same as #8, but it sounded like #8 was solved by using priors which doesn't seem to help with this.)

kchodorow avatar Sep 06 '22 22:09 kchodorow