ColossalAI [BUG]: finetone on Teyvat Datasets, but got unexpected results

🐛 Describe the bug

I train according to the readme document and everything goes fine, but when I infer with the finetone model, the result is terrible, but I don't know what is wrong

grid-0004

Environment

CUDA:11.3 Pytorch:1.12.1 Colossalai:0.1.12+torch1.12cu11.4 pytorch-lightning:1.9.0.dev0

Dec 16 '22 12:12 xiejiang0133

I'm also having this issue！！！ This is my reasoning code: python scripts/txt2img.py --prompt "photo of a man wearing a pure white shir and a long pants" --plms \ --outdir ./output \ --config /tmp/2022-12-28T09-59-07_train_colossalai_teyvat/configs/2022-12-28T09-59-07-project.yaml \ --ckpt /tmp/2022-12-28T09-59-07_train_colossalai_teyvat/checkpoints/last.ckpt \ --n_samples 4 This is my result: My dataset has more than 4000 images（750*1101）, max_epochs is set to 50，my profile is as follows:

model:
  base_learning_rate: 1.0e-4
  target: ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    parameterization: "v"
    linear_start: 0.00085
    linear_end: 0.0120
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: image
    cond_stage_key: txt
    image_size: 64
    channels: 4
    cond_stage_trainable: false
    conditioning_key: crossattn
    monitor: val/loss_simple_ema
    scale_factor: 0.18215
    use_ema: False # we set this to false because this is an inference only config


    scheduler_config: # 10000 warmup steps
      target: ldm.lr_scheduler.LambdaLinearScheduler
      params:
        warm_up_steps: [ 1 ] # NOTE for resuming. use 10000 if starting from scratch
        cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
        f_start: [ 1.e-6 ]
        f_max: [ 1.e-4 ]
        f_min: [ 1.e-10 ]


    unet_config:
      target: ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        use_checkpoint: True
        use_fp16: True
        image_size: 32 # unused
        in_channels: 4
        out_channels: 4
        model_channels: 320
        attention_resolutions: [ 4, 2, 1 ]
        num_res_blocks: 2
        channel_mult: [ 1, 2, 4, 4 ]
        num_head_channels: 64 # need to fix for flash-attn
        use_spatial_transformer: True
        use_linear_in_transformer: True
        transformer_depth: 1
        context_dim: 1024
        legacy: False

    first_stage_config:
      target: ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          #attn_type: "vanilla-xformers"
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    cond_stage_config:
      target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
      params:
        freeze: True
        layer: "penultimate"

data:
  target: main.DataModuleFromConfig
  params:
    batch_size: 16
    num_workers: 4
    train:
      target: ldm.data.teyvat.hf_dataset
      params:
        path: zbdbc/fashion
        image_transforms:
        - target: torchvision.transforms.Resize
          params:
            size: 512
        - target: torchvision.transforms.RandomCrop
          params:
            size: 512
        - target: torchvision.transforms.RandomHorizontalFlip

lightning:
  trainer:
    accelerator: 'gpu'
    devices: 4
    log_gpu_memory: all
    max_epochs: 50
    precision: 16
    auto_select_gpus: False
    strategy:
      target: strategies.ColossalAIStrategy
      params:
        use_chunk: True
        enable_distributed_storage: True
        placement_policy: auto
        force_outputs_fp32: true

    log_every_n_steps: 2
    logger: True
    default_root_dir: "/tmp/diff_log/"
    # profiler: pytorch

  logger_config:
    wandb:
      target: loggers.WandbLogger
      params:
          name: nowname
          save_dir: "/tmp/diff_log/"
          offline: opt.debug
          id: nowname

I don't know what the problem is, I think training is normal, but reasoning is bad, at least to output something

Dec 30 '22 06:12 haoli-zbdbc

what is your ckpt for training

Jan 03 '23 08:01 Fazziekey

@Fazziekey hello,I downloaded the model checkpoint from pretrained, as suggested in the examples

git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4

An error is reported when I add this parameter ··· from_pretrained: '/home/project/ColossalAI/examples/images/diffusion/stable-diffusion-v1-4/vae/diffusion_pytorch_model.bin' ···

I also suspected that the pre-training CKPT file might not have been read, but I didn't know where to add this configuration

Jan 03 '23 08:01 haoli-zbdbc

@Fazziekey hello,I downloaded the model checkpoint from pretrained, as suggested in the examples
git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
An error is reported when I add this parameter ··· from_pretrained: '/home/project/ColossalAI/examples/images/diffusion/stable-diffusion-v1-4/vae/diffusion_pytorch_model.bin' ···

I also suspected that the pre-training CKPT file might not have been read, but I didn't know where to add this configuration

Thanks for your issue, we are updating our code to stable diffusion v2, the from pretrain args is removed in v2

Jan 03 '23 09:01 Fazziekey

Well, thank you for your work. Expect you to update the code and steps to the examples.

Jan 03 '23 09:01 haoli-zbdbc

Well, thank you for your work. Expect you to update the code and steps to the examples.

Thanks for your outstanding, There are more bug and problem in stable diffusion v2, we will offer a stable train version as soon as we can

Jan 03 '23 09:01 Fazziekey

ColossalAI ColossalAI copied to clipboard

[BUG]: finetone on Teyvat Datasets, but got unexpected results

🐛 Describe the bug

Environment

ColossalAI
ColossalAI copied to clipboard