clip-guided-diffusion icon indicating copy to clipboard operation
clip-guided-diffusion copied to clipboard

Issue #20 still not working.

Open htoyryla opened this issue 2 years ago • 0 comments

Still does not work. See the context in the original issue.

ResizeRight is expecting either a numpy array or a torch tensor, now it gets a PIL image which does not have shape attribute.

https://github.com/afiaka87/clip-guided-diffusion/blob/a631a06b51ac5c6636136fab27833c68862eaa24/cgd/clip_util.py#L57-L62

This is what I tried and at least it runs without an error

   t_img = tvf.to_tensor(pil_img)
   t_img = resize_right.resize(t_img, out_shape=(smallest_side, smallest_side),
                                 interp_method=lanczos3, support_sz=None,
                                 antialiasing=True, by_convs=False, scale_tolerance=None)
   batch = make_cutouts(t_img.unsqueeze(0).to(device)) 

I am not sure what was intended here as to the output shape. As it was, it made 1024x512 from 1024x1024 original, for image_size 512, now this makes 512x512.

I am not using offsets, BTW.

As to the images produced, can't see much happening when using image prompts, but I guess that is another story. According to my experience guidance by comparing CLIP encoded images is not very useful as such, so I'll probably go my own way to add other ways as to image based guidance. This might depend on the kind of images I work with and how. More visuality than semantics.

PS. I see now that the init image actually means using perceptual losses as guidance, rather than initialising something (like one can do with VQGAN latents for instance). So that's more like what I am after.

Originally posted by @htoyryla in https://github.com/afiaka87/clip-guided-diffusion/issues/20#issuecomment-1045961800

htoyryla avatar Feb 19 '22 09:02 htoyryla