latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

how to use autoencoder and ldm model ?

Open huangyehui opened this issue 1 year ago • 7 comments

In the section 'Model Training ' , the autoencoder and ldm model can be trained, but I can not found the script use the the models. plz explain the way to use. It is confusing

huangyehui avatar May 06 '23 12:05 huangyehui

Hello, I opened an issue regarding a similar problem trying to train an autoencoder from scratch: #270. My way of training an AE is descrided there (works to train the AE, but doesnt lead to good results in combination with the UNet).
For your issue, you can instanciate a model via any of the config files in models/ldm/ with pretrained weights for the autoencoder by adding a line 'ckpt_path' in the first_stage_model section of the config file. When I do that, I get good results training the UNet both for super-resolution and for unconditional image generation. My problems arise when trying to train the AE from scratch.

simon-donike avatar May 08 '23 13:05 simon-donike

Hello, I opened an issue regarding a similar problem trying to train an autoencoder from scratch: #270. My way of training an AE is descrided there (works to train the AE, but doesnt lead to good results in combination with the UNet). For your issue, you can instanciate a model via any of the config files in models/ldm/ with pretrained weights for the autoencoder by adding a line 'ckpt_path' in the first_stage_model section of the config file. When I do that, I get good results training the UNet both for super-resolution and for unconditional image generation. My problems arise when trying to train the AE from scratch.

Thanks a lot for the help. I will try

huangyehui avatar May 16 '23 09:05 huangyehui

@simon-donike hi Simon, do you know the pre-trained model of FFHQ provides both diffusion and AE checkpoints when doing unconditional generation using script sample_diffusion.py?

forever208 avatar Aug 04 '23 20:08 forever208

@forever208 thanks, I do. For my application I had to retrain though since I'm using more channels.

simon-donike avatar Aug 05 '23 10:08 simon-donike

@forever208 thanks, I do. For my application I had to retrain though since I'm using more channels.

Hi~I also try to train the AE on my own dataset but I got some wried resutls as mentioned in #309 . I simply use the default config file but the loss becomes negative. Could you give some suggestions? Thanks!

OwalnutO avatar Aug 28 '23 06:08 OwalnutO

Hello, I opened an issue regarding a similar problem trying to train an autoencoder from scratch: #270. My way of training an AE is descrided there (works to train the AE, but doesnt lead to good results in combination with the UNet). For your issue, you can instanciate a model via any of the config files in models/ldm/ with pretrained weights for the autoencoder by adding a line 'ckpt_path' in the first_stage_model section of the config file. When I do that, I get good results training the UNet both for super-resolution and for unconditional image generation. My problems arise when trying to train the AE from scratch.

Hi, would you mind sharing what your code looks like for this? I've been trying to do this for inference with the pretrained LDM for superresolution (I've added the vq-f4 autoencoder ckpt path to the yaml config file) but when I do, the result I get is just the loss value instead of the image:

import torch
from torchvision import transforms
from omegaconf import OmegaConf
from ldm.models.diffusion.ddpm import LatentDiffusion

# Load the model using the configuration file and checkpoint
config_path = "ldm_config.yaml"
ckpt_path = "sr_bsr.ckpt"
config = OmegaConf.load(config_path)
model = instantiate_from_config(config.model)

print(f"Loading model from {ckpt_path}")
pl_sd = torch.load(ckpt_path, map_location="cpu")
global_step = pl_sd["global_step"]
sd = pl_sd["state_dict"]
# model = instantiate_from_config(config.model)
m, u = model.load_state_dict(sd, strict=False)

# Set the model in evaluation mode
model.eval()

# Load and preprocess image
image_path ='../../../../test_image.nii.gz'
image = nib.load(image_path).get_fdata()
image = np.expand_dims(image, axis=0)
image = np.expand_dims(image, axis=1)
image = np.repeat(image, 3, axis=1)

lr_image = torch.tensor(image, dtype=torch.double)  # Convert to PyTorch tensor
normalize = transforms.Normalize(mean=[0.5], std=[0.5])
lr_image = normalize(lr_image)  # Normalize the pixel values
lr_image = lr_image.to(torch.double)

dummy_conditioning = torch.zeros(1, 3, 256, 256, dtype=torch.double)  # Modify the shape as needed
print(np.unique(dummy_conditioning))

# Pass the low-resolution image through the LDM model
with torch.no_grad():
    result, _ = model(lr_image, c=dummy_conditioning)

# Print the result
print("Image result:", result)

pandas351 avatar Aug 31 '23 23:08 pandas351

@pandas351 That's because youre calling the .forward() function. Check out the other functions like make_convolutional_sample etc

simon-donike avatar Oct 02 '23 14:10 simon-donike