diffusers
diffusers copied to clipboard
[Stable Diffusion] Longer prompt and rectangular image
Thanks for open-sourcing this great repo! I have been using this code with stable diffusion checkpoints. There are 2 key features that https://github.com/CompVis/stable-diffusion supports, but seem to be missing here:
- Generate rectangular images, which people showcase a lot on the internet.
- Currently the max token length is 77, any way we can make this much longer?
Thanks again!
Hey @LinxiFan,
By
Generate rectangular images, which people showcase a lot on the internet.
Do you just mean functionality to display the images?
Note that it's pretty trivial to do this with the just merged StableDiffusionPipeline (available on master, but we'll make a release tomorrow). You could do the following to display an image (either locally and save it or in a google colab directly):
# pip install git+https://github.com/huggingface/diffusers.git
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-1-diffusers", use_auth_token=True) # make sure you're logged in with `huggingface-cli login`
image = pipe("eiffel tower on the moon", guidance_scale=8)["sample"][0] # image here is in [PIL format](https://pillow.readthedocs.io/en/stable/)
# Now to display an image you can do either save it such as:
image.save(f"eiffel_tower_moon.png")
# or if you're in a google colab you can directly display it with
display(image)
Does that answer the first question?
Regarding the second question,
You're totally right, there is no reason to be limited by a random number like 77 - corrected it here: https://github.com/huggingface/diffusers/pull/168/#discussion_r945284249
Please let me know if this didn't fully answer your questions @LinxiFan !
Note: 77 was not a random number, CLIP has max_position_embeddings=77, don't think it can handle more, since it uses absolute pos embeddings.
Sorry, @patil-suraj is 100% correct, I was a bit thrown off by a rather unusual (not power of 2) number, like 77 -> but it's indeed the maximum length and it's not possible with the current model to input larger prompts sadly @LinxiFan
Here the proof: https://huggingface.co/openai/clip-vit-base-patch16/blob/main/config.json#L45
Hi, thanks for getting back to me!
Regarding rectangular image: no, it isn't about display. The original compvis repo (https://github.com/CompVis/stable-diffusion) script can take H and W as generation parameters, so the algorithm can produce a non-square image (such as https://www.reddit.com/r/StableDiffusion/comments/wo3wm3/jazz_robots/).
Regarding the prompt length - the above compvis repo seems to support a lot longer prompt. At least the Discord bot they deployed was able to accept a long paragraph. Thanks again for your help!
Hey @LinxiFan,
It seems like the original CompVis repo simply truncates longer inputs lengths to the max length which is 77 (check here), this is the same we've now implemented in the stable diffusion pipeline, see here
Regarding the rectangular image - sorry I misunderstood! This is indeed a functionality we have not yet implemented in the StableDiffusionPipeline!
@anton-l @patil-suraj - let's implement it as well? (see: https://github.com/CompVis/stable-diffusion/blob/3fbbf28a8a66d6f0ffcaca43a76521b6d9b5bfb3/scripts/txt2img.py#L221)
Already on it seems :heart_eyes: https://github.com/huggingface/diffusers/pull/179
#179 is merged, you can now pass the height and width options to StableDiffusionPipeline.
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-1-diffusers", use_auth_token=True) # make sure you're logged in with `huggingface-cli login`
image = pipe("eiffel tower on the moon", height=512, width=768, guidance_scale=8)["sample"][0]
This code throws an error in line:
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-1-diffusers", use_auth_token=True) # make sure you're logged in with huggingface-cli login
I am guessing is related to the autorization to access to CompVis/stable-diffusion-v1-1-diffusers what do you mean by
"make sure you're logged in with `huggingface-cli login"
After login, I got this error.. how to solve it ?
Hi @jfdelgad and @taki0112! To log in from a colab notebook, it's enough to simply run !huggingface-cli login and paste your API token from https://huggingface.co/settings/tokens into the field that appears below:

A more notebook-friendly login interface is also supported:
from huggingface_hub import notebook_login
notebook_login()
Then you should be able to download the model with StableDiffusionPipeline.from_pretrained(...) if you've been given access to the CompVis org. I.e. if you've filled out the model request form and can access this page from your browser: https://huggingface.co/CompVis/stable-diffusion-v1-3-diffusers
P.S. Feel free to open another issue to continue this thread, so that we don't get lost :)
I think we can close this issue no since #179 is merged