ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Something wrong with training on lsun bedroom dataset

Open pokameng opened this issue 2 years ago • 8 comments

@ryanrussell @xcnick @feifeibear @junxu @jimmieliu Hi gays! Thanks for your great works. I have train a new model by use of colossalAL, and my dataset is lsun_bedroom. But i seem to have some problems wit it.

This is my config: model: base_learning_rate: 1.0e-04 target: ldm.models.diffusion.ddpm.LatentDiffusion params: linear_start: 0.00085 linear_end: 0.0120 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: image # cond_stage_key: txt image_size: 64 channels: 4 cond_stage_trainable: false # Note: different from the one we trained before conditioning_key: crossattn monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: False

scheduler_config: # 10000 warmup steps
  target: ldm.lr_scheduler.LambdaLinearScheduler
  params:
    warm_up_steps: [ 1 ] # NOTE for resuming. use 10000 if starting from scratch
    cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
    f_start: [ 1.e-6 ]
    f_max: [ 1.e-4 ]
    f_min: [ 1.e-10 ]

unet_config:
  target: ldm.modules.diffusionmodules.openaimodel.UNetModel
  params:
    image_size: 32 # unused
    from_pretrained: '/home/dailongquan/110.014/ColossalAI-main/weight/stable-diffusion-v1-4/unet/diffusion_pytorch_model.bin'
    in_channels: 4
    out_channels: 4
    model_channels: 320
    attention_resolutions: [ 4, 2, 1 ]
    num_res_blocks: 2
    channel_mult: [ 1, 2, 4, 4 ]
    num_heads: 8
    use_spatial_transformer: True
    transformer_depth: 1
    context_dim: 768
    use_checkpoint: False
    legacy: False

first_stage_config:
  target: ldm.models.autoencoder.AutoencoderKL
  params:
    embed_dim: 4
    from_pretrained: '/home/dailongquan/110.014/ColossalAI-main/weight/stable-diffusion-v1-4/vae/diffusion_pytorch_model.bin'
    monitor: val/rec_loss
    ddconfig:
      double_z: true
      z_channels: 4
      resolution: 256
      in_channels: 3
      out_ch: 3
      ch: 128
      ch_mult:
      - 1
      - 2
      - 4
      - 4
      num_res_blocks: 2
      attn_resolutions: []
      dropout: 0.0
    lossconfig:
      target: torch.nn.Identity

cond_stage_config:
  target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
  params:
    use_fp16: True

data: target: main.DataModuleFromConfig params: batch_size: 16 num_workers: 16

wrap: False
train:
  target: ldm.data.lsun.LSUNBedroomsTrain
  params:
    size: 256
validation:
  target: ldm.data.lsun.LSUNBedroomsValidation
  params:
    size: 256

lightning: trainer: accelerator: 'gpu' devices: 4 log_gpu_memory: all max_epochs: 2 precision: 16 auto_select_gpus: False strategy: target: lightning.pytorch.strategies.ColossalAIStrategy params: use_chunk: False enable_distributed_storage: True, placement_policy: cuda force_outputs_fp32: False

log_every_n_steps: 2
logger: True
default_root_dir: "/tmp/diff_log/"
profiler: pytorch

logger_config: wandb: target: lightning.pytorch.loggers.WandbLogger params: name: nowname save_dir: "/tmp/diff_log/" offline: opt.debug id: nowname

But the program report that :

image

Can you help me solve this bug? thanks!!!

pokameng avatar Nov 23 '22 13:11 pokameng

@ver217 hello, can you help me solve this problem?

pokameng avatar Nov 24 '22 05:11 pokameng

@Fazziekey hello, can you help me solve this problem?

pokameng avatar Nov 24 '22 05:11 pokameng

hi, as indicated by the logs, your input data type is wrong. Can you check if the text you feed in the tokenizer is the correct type (str or List[str])?

feifeibear avatar Nov 24 '22 05:11 feifeibear

the text you fe

hello , I use lsun bedroom dataset to train autoencoder and latent model, but colossalAI seems to not provide a lsun config with us. I refer to latent diffusion git to modify the colossalai config and the text file is shown as follow: image image I print the type of text but get tensor type.

Can you share me the lsun bedroom config based on colossalai ? THanks!!! I am vey hurry to run colossalai!!!!

pokameng avatar Nov 24 '22 05:11 pokameng

may be we can chat with wechat? My wechat is NLG-wsm @feifeibear

pokameng avatar Nov 24 '22 05:11 pokameng

@binmakeswell Hi, can you help me solve this problem, i am so hurry to run this code!!!

pokameng avatar Nov 24 '22 05:11 pokameng

Hi, can you print the type of the input text?

FrankLeeeee avatar Nov 24 '22 05:11 FrankLeeeee

I have printed the tpye of input, but get tensor type. Can you share me the config about lsun bedroom dataset?

繁华落尽 @.***

 

------------------ 原始邮件 ------------------ 发件人: "hpcaitech/ColossalAI" @.>; 发送时间: 2022年11月24日(星期四) 中午1:58 @.>; @.@.>; 主题: Re: [hpcaitech/ColossalAI] Something wrong with training on lsun bedroom dataset (Issue #2013)

Hi, can you print the type of the input text?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

pokameng avatar Nov 24 '22 16:11 pokameng

We have updated a lot. This issue was closed due to inactivity. Thanks.

binmakeswell avatar Apr 14 '23 08:04 binmakeswell