diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Stable Diffusion: providing a list of positive prompts and a list of negative prompts does not work as expected

Open pcuenca opened this issue 2 years ago • 1 comments

Describe the bug

See this forum post: https://discuss.huggingface.co/t/stable-diffusion-bs-1-uses-negative-as-prompt/24130.

In short:

_ = pipe(["frog"]*2, negative_prompt=["bird"]*2)

Reaches this condition https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L247. As expected, negative_prompt is a list with the same cardinality of prompt (otherwise an exception would have been raised). Therefore, in this case uncond_input.input_ids would have shape (2, 77, 768). But then https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L247 makes it become (4, 77, 768) because batch_size is 2. Therefore the concatenation operation in the next line creates text_embeddings with shape (6, 77, 768). The first 4 ones are the ones corresponding to the negative prompt, and that's what gets passed to the model because the latents are correctly computed with shape (2, 4, 64, 64) (later expanded to batch size 4 during cfg).

Reproduction

I could reproduce as explained above.

Logs

No response

System Info

diffusers @ main (f3983d16eed57e46742d217363d8913bef7f748d)

pcuenca avatar Oct 08 '22 12:10 pcuenca

Any ideas on how we can solve it?

patrickvonplaten avatar Oct 10 '22 12:10 patrickvonplaten

Running into the same issue. Seems the batchwise aspect is added in two places, causing the issue.

The code expects the negative prompt list to be the same length as the positive prompt list, ie

_ = pipe([positive_prompt]*bs, negative_prompt=[negative_prompt])

throws an error

`negative_prompt`: [negative_prompt] has batch size 1, but `prompt`: [positive_prompt, positive_prompt, ...] has batch size bs

but if you instead run

_ = pipe([positive_prompt]*bs, negative_prompt=[negative_prompt]*bs)

Then the following section repeats along the batch dimension unnecessarily, messing with the sizes during the view

uncond_input = self.tokenizer(
                uncond_tokens,
                padding="max_length",
                max_length=max_length,
                truncation=True,
                return_tensors="pt",
            )
uncond_embeddings = self.text_encoder(uncond_input.input_ids.to(self.device))[0]

seq_len = uncond_embeddings.shape[1]
uncond_embeddings = uncond_embeddings.repeat(batch_size, num_images_per_prompt, 1)
uncond_embeddings = uncond_embeddings.view(batch_size * num_images_per_prompt, seq_len, -1)

When I run this, my uncond_embeddings end up with a size of [bs, sl, 768*bs], which causes an error during concatenation to the positive embeddings

kheyer avatar Oct 21 '22 22:10 kheyer

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Nov 18 '22 15:11 github-actions[bot]

I think this was resolved in #1120.

pcuenca avatar Nov 18 '22 15:11 pcuenca