latent-diffusion how to train the model on my own datasets，my datasets have 8 class? Dose anyone have the training script?

Apr 03 '23 08:04 dreamlychina

Hi, to train a latent diffusion model with a personalized dataset you have to create the dataset Class. In the file ldm/data/ you will found the different dataset Class, I recomend you to copy the LSUNBase Class and modify it to load your dataset. It works reading a text file that have information about the file path but you can create a csv and read it without problem. This class return a dictionary in the function getitem, this dictionary should have the key "image" because in the training loop the script will load the image and will use the data in this key. The dictionary could have another keys like "class_label" for example. Finally, to train the model you should use a config file, you will found many of them in the file configs/latent-diffusion/ and in the .yaml you have to add a "data" key like this:

data: target: main.DataModuleFromConfig params: batch_size: 32 wrap: True train: target: ldm.data.personal.PersonalizeTrain0 params: size: 256 degradation: pil_nearest validation: target: ldm.data.personal.PersonalizeVal0 params: size: 256 degradation: pil_nearest

Where PersonalizeTrain0 and PersonalizeVal0 are the dataset Classes you create early. If you need my example of dataset class tell me!

Apr 15 '23 18:04 SantiUsma

Would be great if you can share your dataset class. Thanks in advance.

Apr 18 '23 08:04 CanberkSaglam

class Personalize0(Dataset):
    def __init__(self,
                 txt_file,
                 size=128,
                 interpolation="bicubic",
                 flip_p=0.5,
                 is_mri=True,
                 val=False
                 ):
        self.data_paths = txt_file
        self.is_mri=is_mri
        self.image_paths=pd.read_csv(self.data_paths)["Image_path"]
        if is_mri:
            self.mri_paths=pd.read_csv(self.data_paths)["MRI_path"]
        if val:
            self.mri_paths=self.mri_paths[:1024]
            self.image_paths=self.image_paths[:1024]
        self.labels = {
            "relative_file_path_": [l for l in self.image_paths],
            "file_path_": [l for l in self.image_paths],
        }
        self._length = len(self.image_paths)
        self.size = size
        self.interpolation = {"linear": PIL.Image.LINEAR,
                              "bilinear": PIL.Image.BILINEAR,
                              "bicubic": PIL.Image.BICUBIC,
                              "lanczos": PIL.Image.LANCZOS,
                              }[interpolation]
        self.flip = transforms.RandomHorizontalFlip(p=flip_p)

    def __len__(self):
        return self._length

    def __getitem__(self, i):
        example = dict((k, self.labels[k][i]) for k in self.labels)
        image = Image.open(self.image_paths[i])
        if self.is_mri:
            mri=nib.load(self.mri_paths[i]).get_fdata()/2000
            x,y,z,t=mri.shape
            MRI=np.zeros((5,72,88,72))
            MRI[:,:min(x,72),:min(y,88),:min(z,72)]=mri[:min(x,72),:min(y,88),:min(z,72),:].transpose(3,0,1,2)
            #MRI = zoom(MRI, (1,0.75, 0.75, 0.75))
            example["mri"]=MRI.astype(np.float32)
        if not image.mode == "RGB":
            image = image.convert("RGB")

        # default to score-sde preprocessing
        img = np.array(image).astype(np.uint8)
        crop = min(img.shape[0], img.shape[1])
        h, w, = img.shape[0], img.shape[1]
        img = img[(h - crop) // 2:(h + crop) // 2,
              (w - crop) // 2:(w + crop) // 2]

        image = Image.fromarray(img)
        if self.size is not None:
            image = image.resize((self.size, self.size), resample=self.interpolation)

        image = self.flip(image)
        image = np.array(image).astype(np.uint8)
        example["image"] = (image / 127.5 - 1.0).astype(np.float32)
        return example


class PersonalizeTrain0(Personalize0):
    def __init__(self, **kwargs):
        super().__init__(txt_file=csv_path_train)

class PersonalizeVal0(Personalize0):
    def __init__(self, flip_p=0., **kwargs):
        super().__init__(txt_file=csv_path_val ,val=True,
                         flip_p=flip_p)

Apr 27 '23 23:04 SantiUsma

Is it possible to change the input size to become arbitrary, for example 32x256 ?

May 30 '23 09:05 AustrianOakvn

Theoretically yes, if you are talking about the image size. But it is better if you retrain your autoencoder before training the LDM in this personalized size to have better results. However, the autoencoder and the LDM could recibe tensor with any size.

May 31 '23 00:05 SantiUsma

@SantiUsma Thanks! can you explain what it's meaning in your code # default to score-sde preprocessing in the getitem method? and what exactly I need to add in the dataset class in order to train class-condition model? thanks a lot!

Sep 28 '23 13:09 taustudent

@SantiUsma Thanks! can you explain what it's meaning in your code # default to score-sde preprocessing in the getitem method? and what exactly I need to add in the dataset class in order to train class-condition model? thanks a lot!

The "# default to score-sde preprocessing" part is just the images preprocessing nothing else. What you should change in the code is the following:

Remove the mri stuff because it was only to my dataset.
The images in your dataset should have a label, you should add a key-value in the example dictionary. The key should be "label" and the value sould be a number (0,1,2,3,4). If your model have any error using this labels just tell me.

Sep 28 '23 15:09 SantiUsma

@SantiUsma Thanks! can you explain what it's meaning in your code # default to score-sde preprocessing in the getitem method? and what exactly I need to add in the dataset class in order to train class-condition model? thanks a lot!

The "# default to score-sde preprocessing" part is just the images preprocessing nothing else. What you should change in the code is the following:

Remove the mri stuff because it was only to my dataset.

The images in your dataset should have a label, you should add a key-value in the example dictionary. The key should be "label" and the value sould be a number (0,1,2,3,4). If your model have any error using this labels just tell me.

Thanks! what are the guidelines for choosing the right config file? how I can know what is the best for me to base on? and one more question if that ok - you mentioned in another comment training AE. My dataset is totally different from those who used to train the official checkpoints . How I can train AE from scratch? I need to use another repo? thanks a lot!

Oct 16 '23 06:10 taustudent

@SantiUsma Thanks! can you explain what it's meaning in your code # default to score-sde preprocessing in the getitem method? and what exactly I need to add in the dataset class in order to train class-condition model? thanks a lot!

The "# default to score-sde preprocessing" part is just the images preprocessing nothing else. What you should change in the code is the following:

Remove the mri stuff because it was only to my dataset.

The images in your dataset should have a label, you should add a key-value in the example dictionary. The key should be "label" and the value sould be a number (0,1,2,3,4). If your model have any error using this labels just tell me.

Thanks! what are the guidelines for choosing the right config file? how I can know what is the best for me to base on? and one more question if that ok - you mentioned in another comment training AE. My dataset is totally different from those who used to train the official checkpoints . How I can train AE from scratch? I need to use another repo? thanks a lot!

To select the best config file you should test with different architectures to find the best. I recommend you to use the same config file as the pretrained models in the latent-diffusion page to do a fine tuning. To train the AE you should select a AE config file to train, then, you change in the LDM config file the AE checkpoint's path you just train. Be sure you use the same AE as the people use in the LDM checkpoint you are using, inestead the results could not be the best. I used the f=8, VQ (Z=16384, d=4) AE model and the Class-conditional Image Synthesis in the LDM in my project.

Oct 16 '23 14:10 SantiUsma

@SantiUsma Thanks! can you explain what it's meaning in your code # default to score-sde preprocessing in the getitem method? and what exactly I need to add in the dataset class in order to train class-condition model? thanks a lot!

The "# default to score-sde preprocessing" part is just the images preprocessing nothing else. What you should change in the code is the following:

Remove the mri stuff because it was only to my dataset.

The images in your dataset should have a label, you should add a key-value in the example dictionary. The key should be "label" and the value sould be a number (0,1,2,3,4). If your model have any error using this labels just tell me.

Thanks! what are the guidelines for choosing the right config file? how I can know what is the best for me to base on? and one more question if that ok - you mentioned in another comment training AE. My dataset is totally different from those who used to train the official checkpoints . How I can train AE from scratch? I need to use another repo? thanks a lot!

To select the best config file you should test with different architectures to find the best. I recommend you to use the same config file as the pretrained models in the latent-diffusion page to do a fine tuning. To train the AE you should select a AE config file to train, then, you change in the LDM config file the AE checkpoint's path you just train. Be sure you use the same AE as the people use in the LDM checkpoint you are using, inestead the results could not be the best. I used the f=8, VQ (Z=16384, d=4) AE model and the Class-conditional Image Synthesis in the LDM in my project.

thanks again. Do you know a way for finetune the AE or that I have to train it from scratch? And for the configs, the images in my dataset are at different size from thus who used for the pretrained models, should I change something in the configs files regarding the images sizes? last question - you mentioned that you used the VQ AE, but in the config files of the AE there are only KL AEs, where you get configs for VQ from?

Oct 17 '23 13:10 taustudent

@SantiUsma Thanks! can you explain what it's meaning in your code # default to score-sde preprocessing in the getitem method? and what exactly I need to add in the dataset class in order to train class-condition model? thanks a lot!

The "# default to score-sde preprocessing" part is just the images preprocessing nothing else. What you should change in the code is the following:

Remove the mri stuff because it was only to my dataset.

The images in your dataset should have a label, you should add a key-value in the example dictionary. The key should be "label" and the value sould be a number (0,1,2,3,4). If your model have any error using this labels just tell me.

Thanks! what are the guidelines for choosing the right config file? how I can know what is the best for me to base on? and one more question if that ok - you mentioned in another comment training AE. My dataset is totally different from those who used to train the official checkpoints . How I can train AE from scratch? I need to use another repo? thanks a lot!

To select the best config file you should test with different architectures to find the best. I recommend you to use the same config file as the pretrained models in the latent-diffusion page to do a fine tuning. To train the AE you should select a AE config file to train, then, you change in the LDM config file the AE checkpoint's path you just train. Be sure you use the same AE as the people use in the LDM checkpoint you are using, inestead the results could not be the best. I used the f=8, VQ (Z=16384, d=4) AE model and the Class-conditional Image Synthesis in the LDM in my project.

thanks again. Do you know a way for finetune the AE or that I have to train it from scratch? And for the configs, the images in my dataset are at different size from thus who used for the pretrained models, should I change something in the configs files regarding the images sizes? last question - you mentioned that you used the VQ AE, but in the config files of the AE there are only KL AEs, where you get configs for VQ from?

Teorically, you don't need to change the image's size because the AE doesn't depend on that. However, it is better to use the same size as the checkpoints. You could change the image size in your Dataset class (The "Personalize0" class I mentioned before) in the getitem method. And in the config file you are able to select the AE checkpoint to do the finetuning, you must download it from the page. If you want to finetuning you only have to run main.py code and selecting the AE config file, it will trian the AE instead of LDM.

Finally, the VQVAE config I used is in "models/first_stage_models/" folder.

Oct 17 '23 16:10 SantiUsma

@SantiUsma thanks a lot for your help! I succeeded to get good results when training the AE, but now I'm having troubles to match it the right config file of the LDM part. I'm keep getting errors like RuntimeError: Given groups=1, weight of size [192, 3, 3, 3], expected input[12, 16, 8, 8] to have 3 channels, but got 16 channels instead and it's seems that the configuration of the AE/UNET in the LAD config file isn't matching to the one in the AE config file that I trained. Do you know how to match them (giving trained AE with his config, how to find the right LDM config file)? Or could you give me example for 2 configs files (one for AE and one for LDM) that match between them (and suitable for class-condition image generation)?

Oct 23 '23 15:10 taustudent

Based on this repo, I build my code of DDM-Public, that can be applied to conditional generation such as super-resolution, saliency detection and image inpainting. Details can be found in my repo. Hope can help you.

Oct 24 '23 14:10 GuHuangAI

@SantiUsma thanks a lot for your help! I succeeded to get good results when training the AE, but now I'm having troubles to match it the right config file of the LDM part. I'm keep getting errors like RuntimeError: Given groups=1, weight of size [192, 3, 3, 3], expected input[12, 16, 8, 8] to have 3 channels, but got 16 channels instead and it's seems that the configuration of the AE/UNET in the LAD config file isn't matching to the one in the AE config file that I trained. Do you know how to match them (giving trained AE with his config, how to find the right LDM config file)? Or could you give me example for 2 configs files (one for AE and one for LDM) that match between them (and suitable for class-condition image generation)?

Hi，did you succeed?

Dec 23 '23 13:12 tangwy98

latent-diffusion latent-diffusion copied to clipboard

how to train the model on my own datasets，my datasets have 8 class? Dose anyone have the training script?

latent-diffusion
latent-diffusion copied to clipboard