diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Allow image resolutions multiple of 8 instead of 64 in SVD pipeline

Open mlfarinha opened this issue 1 year ago • 7 comments

Hello! This is my first time making a PR to diffusers so I apologise if I have missed something!

This PR implements the same behaviour from unet_2d_condition.py and unet_3d_condition.py to allow generating images with height/width that are not multiples of 64 while still being multiples of 8 in unet_spatio_temporal_condition.py. To achieve this I changed the forward method of the UNetSpatioTemporalConditionModel upblocks UpBlockSpatioTemporal and CrossAttnUpBlockSpatioTemporal in unet_3d_blocks.py. This PR is related to the closed issue https://github.com/huggingface/diffusers/issues/255


Code to reproduce the error:

import torch

from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16"
)
pipe.enable_model_cpu_offload()

# Load the conditioning image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
image = image.resize((640, 480))

generator = torch.manual_seed(42)
frames = pipe(image, height=480, width=640, decode_chunk_size=8, generator=generator).frames[0]

Error:

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 15 for tensor number 1 in the list.


@patrickvonplaten and @sayakpaul

mlfarinha avatar Jan 20 '24 01:01 mlfarinha

@DN6 @patil-suraj can you check here?

patrickvonplaten avatar Jan 23 '24 11:01 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Feb 19 '24 15:02 github-actions[bot]

@DN6 could you give this a look?

sayakpaul avatar Feb 19 '24 15:02 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Mar 27 '24 15:03 github-actions[bot]

HI @mlfarinha sorry I missed this. PR looks good to me. Could we resolve the conflicts and we can merge.

DN6 avatar Mar 28 '24 05:03 DN6

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 21 '24 15:04 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 16 '24 15:05 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 14 '24 15:09 github-actions[bot]

gentle pin @mlfarinha

yiyixuxu avatar Sep 17 '24 21:09 yiyixuxu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Oct 12 '24 15:10 github-actions[bot]

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.