diffae icon indicating copy to clipboard operation
diffae copied to clipboard

Issues with Conditional Sampling

Open aashishrai3799 opened this issue 1 year ago • 3 comments

Hi,

Consider the following lines of code:

cond1 = model.encode(batch) xT = model.encode_stochastic(batch, cond1, T=50) pred = model.render(noise= xT , cond=cond1, T=20) #xT_rand = torch.rand(xT.shape, device=device) #pred_rand = model.render(noise= xT_rand , cond=cond1, T=20)

The above autoencoding works perfectly as expected. However, instead of using xT, if I use xT_rand with the same cond1, I get nothing but noise in the predicted image. Could you please help me understand why that happens? As mentioned in the paper, most of the semantic information is captured in z_sem, so why does it fails in this case?

Your response will be greatly appreciated.

Thank you!

aashishrai3799 avatar Feb 20 '24 22:02 aashishrai3799

torch.rand is a uniform random which is not what the diffusion model trained for. Please use torch.randn.

phizaz avatar Feb 21 '24 06:02 phizaz

Hi, thank you for your quick response. Despite using torch.randn, I get distorted output. Here's an example:

(input - noise - prediction) image

And this happens for all the examples I tested, not just this one. Do you have any insights into why this is happening?

Thanks again!

aashishrai3799 avatar Feb 22 '24 03:02 aashishrai3799

I'm not sure what's the usecase here. Can you tell me what's the big picture? This doesn't seem like the usecase mentioned in the paper.

phizaz avatar Feb 22 '24 21:02 phizaz