prompt-to-prompt
prompt-to-prompt copied to clipboard
code for user-defined mask
Hello! I am trying to use your code for "Null-text Inversion for Editing Real Images using Guided Diffusion Models". In particular, since I have an inpainting mask, I am trying to generate an image using a user-defined mask (like shown in fig. 8 or fig. 14 of "Prompt-To-Prompt Image Editing With Cross-Attention Control"). The code for using user-defined mask is missing, so I was trying to implement a way to do that. Did you just apply the given mask instead of the one computed from the prompt in LocalBlend? Could the following code represent what you did (resizing the mask to 64x64, repeating over the 2 channels, applying the mask to the latent space)?
class LocalBlend:
...
def __init__(...)
...
mask = np.array(Image.fromarray(mask).resize((64, 64), Image.NEAREST))
mask = mask[None,None,:,:]
mask = mask.repeat(2, axis=0)
self.mask = torch.from_numpy(mask).cuda()
def __call__(...)
...
mask = self.mask
mask = mask.float()
x_t = x_t[:1] + mask * (x_t - x_t[:1])
The code I wrote works (example in the image), I am just wondering if it follows the way you intended to do it.
As you can see, using the given mask, the code above allows me to edit just the pie on the left, instead of all the pies:
The code I wrote works (example in the image), I am just wondering if it follows the way you intended to do it.
As you can see, using the given mask, the code above allows me to edit just the pie on the left, instead of all the pies:
Thanks for bringing this up. I also have a similar question about replacing the estimated mask with user-provided masks. Could you share the code to reproduce the results shown in the above example? I noticed that the rolling pin on the right was distorted, even with the presence of the mask.
@fabrizioguillaro what if the mask didn't match the position of the pie? like the mask is on the right.. would it still give reasonable results?