ColossalAI
ColossalAI copied to clipboard
Something wrong with training on lsun bedroom dataset
@ryanrussell @xcnick @feifeibear @junxu @jimmieliu Hi gays! Thanks for your great works. I have train a new model by use of colossalAL, and my dataset is lsun_bedroom. But i seem to have some problems wit it.
This is my config: model: base_learning_rate: 1.0e-04 target: ldm.models.diffusion.ddpm.LatentDiffusion params: linear_start: 0.00085 linear_end: 0.0120 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: image # cond_stage_key: txt image_size: 64 channels: 4 cond_stage_trainable: false # Note: different from the one we trained before conditioning_key: crossattn monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: False
scheduler_config: # 10000 warmup steps
target: ldm.lr_scheduler.LambdaLinearScheduler
params:
warm_up_steps: [ 1 ] # NOTE for resuming. use 10000 if starting from scratch
cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
f_start: [ 1.e-6 ]
f_max: [ 1.e-4 ]
f_min: [ 1.e-10 ]
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32 # unused
from_pretrained: '/home/dailongquan/110.014/ColossalAI-main/weight/stable-diffusion-v1-4/unet/diffusion_pytorch_model.bin'
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_heads: 8
use_spatial_transformer: True
transformer_depth: 1
context_dim: 768
use_checkpoint: False
legacy: False
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
from_pretrained: '/home/dailongquan/110.014/ColossalAI-main/weight/stable-diffusion-v1-4/vae/diffusion_pytorch_model.bin'
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
params:
use_fp16: True
data: target: main.DataModuleFromConfig params: batch_size: 16 num_workers: 16
wrap: False
train:
target: ldm.data.lsun.LSUNBedroomsTrain
params:
size: 256
validation:
target: ldm.data.lsun.LSUNBedroomsValidation
params:
size: 256
lightning: trainer: accelerator: 'gpu' devices: 4 log_gpu_memory: all max_epochs: 2 precision: 16 auto_select_gpus: False strategy: target: lightning.pytorch.strategies.ColossalAIStrategy params: use_chunk: False enable_distributed_storage: True, placement_policy: cuda force_outputs_fp32: False
log_every_n_steps: 2
logger: True
default_root_dir: "/tmp/diff_log/"
profiler: pytorch
logger_config: wandb: target: lightning.pytorch.loggers.WandbLogger params: name: nowname save_dir: "/tmp/diff_log/" offline: opt.debug id: nowname
But the program report that :
Can you help me solve this bug? thanks!!!
@ver217 hello, can you help me solve this problem?
@Fazziekey hello, can you help me solve this problem?
hi, as indicated by the logs, your input data type is wrong. Can you check if the text you feed in the tokenizer is the correct type (str or List[str])?
the text you fe
hello , I use lsun bedroom dataset to train autoencoder and latent model, but colossalAI seems to not provide a lsun config with us. I refer to latent diffusion git to modify the colossalai config and the text file is shown as follow:
I print the type of text but get tensor type.
Can you share me the lsun bedroom config based on colossalai ? THanks!!! I am vey hurry to run colossalai!!!!
may be we can chat with wechat? My wechat is NLG-wsm @feifeibear
@binmakeswell Hi, can you help me solve this problem, i am so hurry to run this code!!!
Hi, can you print the type of the input text
?
I have printed the tpye of input, but get tensor type. Can you share me the config about lsun bedroom dataset?
繁华落尽 @.***
------------------ 原始邮件 ------------------ 发件人: "hpcaitech/ColossalAI" @.>; 发送时间: 2022年11月24日(星期四) 中午1:58 @.>; @.@.>; 主题: Re: [hpcaitech/ColossalAI] Something wrong with training on lsun bedroom dataset (Issue #2013)
Hi, can you print the type of the input text?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
We have updated a lot. This issue was closed due to inactivity. Thanks.