diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[Stable Diffusion] Longer prompt and rectangular image

Open DrJimFan opened this issue 3 years ago • 8 comments
trafficstars

Thanks for open-sourcing this great repo! I have been using this code with stable diffusion checkpoints. There are 2 key features that https://github.com/CompVis/stable-diffusion supports, but seem to be missing here:

  • Generate rectangular images, which people showcase a lot on the internet.
  • Currently the max token length is 77, any way we can make this much longer?

Thanks again!

DrJimFan avatar Aug 13 '22 17:08 DrJimFan

Hey @LinxiFan,

By

Generate rectangular images, which people showcase a lot on the internet.

Do you just mean functionality to display the images? Note that it's pretty trivial to do this with the just merged StableDiffusionPipeline (available on master, but we'll make a release tomorrow). You could do the following to display an image (either locally and save it or in a google colab directly):

# pip install git+https://github.com/huggingface/diffusers.git

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-1-diffusers", use_auth_token=True)  # make sure you're logged in with `huggingface-cli login`
image = pipe("eiffel tower on the moon", guidance_scale=8)["sample"][0]  # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)

# Now to display an image you can do either save it such as:
image.save(f"eiffel_tower_moon.png")

# or if you're in a google colab you can directly display it with 
display(image)

Does that answer the first question?

Regarding the second question,

You're totally right, there is no reason to be limited by a random number like 77 - corrected it here: https://github.com/huggingface/diffusers/pull/168/#discussion_r945284249

patrickvonplaten avatar Aug 14 '22 12:08 patrickvonplaten

Please let me know if this didn't fully answer your questions @LinxiFan !

patrickvonplaten avatar Aug 14 '22 12:08 patrickvonplaten

Note: 77 was not a random number, CLIP has max_position_embeddings=77, don't think it can handle more, since it uses absolute pos embeddings.

patil-suraj avatar Aug 14 '22 12:08 patil-suraj

Sorry, @patil-suraj is 100% correct, I was a bit thrown off by a rather unusual (not power of 2) number, like 77 -> but it's indeed the maximum length and it's not possible with the current model to input larger prompts sadly @LinxiFan

Here the proof: https://huggingface.co/openai/clip-vit-base-patch16/blob/main/config.json#L45

patrickvonplaten avatar Aug 14 '22 13:08 patrickvonplaten

Hi, thanks for getting back to me!

Regarding rectangular image: no, it isn't about display. The original compvis repo (https://github.com/CompVis/stable-diffusion) script can take H and W as generation parameters, so the algorithm can produce a non-square image (such as https://www.reddit.com/r/StableDiffusion/comments/wo3wm3/jazz_robots/).

Regarding the prompt length - the above compvis repo seems to support a lot longer prompt. At least the Discord bot they deployed was able to accept a long paragraph. Thanks again for your help!

DrJimFan avatar Aug 14 '22 20:08 DrJimFan

Hey @LinxiFan,

It seems like the original CompVis repo simply truncates longer inputs lengths to the max length which is 77 (check here), this is the same we've now implemented in the stable diffusion pipeline, see here

patrickvonplaten avatar Aug 14 '22 20:08 patrickvonplaten

Regarding the rectangular image - sorry I misunderstood! This is indeed a functionality we have not yet implemented in the StableDiffusionPipeline!

@anton-l @patil-suraj - let's implement it as well? (see: https://github.com/CompVis/stable-diffusion/blob/3fbbf28a8a66d6f0ffcaca43a76521b6d9b5bfb3/scripts/txt2img.py#L221)

patrickvonplaten avatar Aug 14 '22 20:08 patrickvonplaten

Already on it seems :heart_eyes: https://github.com/huggingface/diffusers/pull/179

patrickvonplaten avatar Aug 14 '22 20:08 patrickvonplaten

#179 is merged, you can now pass the height and width options to StableDiffusionPipeline.


from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-1-diffusers", use_auth_token=True)  # make sure you're logged in with `huggingface-cli login`
image = pipe("eiffel tower on the moon", height=512, width=768, guidance_scale=8)["sample"][0]

patil-suraj avatar Aug 15 '22 06:08 patil-suraj

This code throws an error in line:

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-1-diffusers", use_auth_token=True) # make sure you're logged in with huggingface-cli login

I am guessing is related to the autorization to access to CompVis/stable-diffusion-v1-1-diffusers what do you mean by
"make sure you're logged in with `huggingface-cli login"

jfdelgad avatar Aug 15 '22 17:08 jfdelgad

image

After login, I got this error.. how to solve it ?

taki0112 avatar Aug 17 '22 05:08 taki0112

Hi @jfdelgad and @taki0112! To log in from a colab notebook, it's enough to simply run !huggingface-cli login and paste your API token from https://huggingface.co/settings/tokens into the field that appears below: image

A more notebook-friendly login interface is also supported:

from huggingface_hub import notebook_login
notebook_login()

Then you should be able to download the model with StableDiffusionPipeline.from_pretrained(...) if you've been given access to the CompVis org. I.e. if you've filled out the model request form and can access this page from your browser: https://huggingface.co/CompVis/stable-diffusion-v1-3-diffusers

P.S. Feel free to open another issue to continue this thread, so that we don't get lost :)

anton-l avatar Aug 17 '22 12:08 anton-l

I think we can close this issue no since #179 is merged

patrickvonplaten avatar Aug 23 '22 12:08 patrickvonplaten