ColossalAI
ColossalAI copied to clipboard
[BUG]: finetone on Teyvat Datasets, but got unexpected results
🐛 Describe the bug
I train according to the readme document and everything goes fine, but when I infer with the finetone model, the result is terrible, but I don't know what is wrong
Environment
CUDA:11.3 Pytorch:1.12.1 Colossalai:0.1.12+torch1.12cu11.4 pytorch-lightning:1.9.0.dev0
I'm also having this issue!!!
This is my reasoning code:
python scripts/txt2img.py --prompt "photo of a man wearing a pure white shir and a long pants" --plms \ --outdir ./output \ --config /tmp/2022-12-28T09-59-07_train_colossalai_teyvat/configs/2022-12-28T09-59-07-project.yaml \ --ckpt /tmp/2022-12-28T09-59-07_train_colossalai_teyvat/checkpoints/last.ckpt \ --n_samples 4
This is my result:
My dataset has more than 4000 images(750*1101), max_epochs is set to 50,my profile is as follows:
model:
base_learning_rate: 1.0e-4
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
parameterization: "v"
linear_start: 0.00085
linear_end: 0.0120
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image
cond_stage_key: txt
image_size: 64
channels: 4
cond_stage_trainable: false
conditioning_key: crossattn
monitor: val/loss_simple_ema
scale_factor: 0.18215
use_ema: False # we set this to false because this is an inference only config
scheduler_config: # 10000 warmup steps
target: ldm.lr_scheduler.LambdaLinearScheduler
params:
warm_up_steps: [ 1 ] # NOTE for resuming. use 10000 if starting from scratch
cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
f_start: [ 1.e-6 ]
f_max: [ 1.e-4 ]
f_min: [ 1.e-10 ]
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
use_checkpoint: True
use_fp16: True
image_size: 32 # unused
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2, 1 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4, 4 ]
num_head_channels: 64 # need to fix for flash-attn
use_spatial_transformer: True
use_linear_in_transformer: True
transformer_depth: 1
context_dim: 1024
legacy: False
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
#attn_type: "vanilla-xformers"
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
params:
freeze: True
layer: "penultimate"
data:
target: main.DataModuleFromConfig
params:
batch_size: 16
num_workers: 4
train:
target: ldm.data.teyvat.hf_dataset
params:
path: zbdbc/fashion
image_transforms:
- target: torchvision.transforms.Resize
params:
size: 512
- target: torchvision.transforms.RandomCrop
params:
size: 512
- target: torchvision.transforms.RandomHorizontalFlip
lightning:
trainer:
accelerator: 'gpu'
devices: 4
log_gpu_memory: all
max_epochs: 50
precision: 16
auto_select_gpus: False
strategy:
target: strategies.ColossalAIStrategy
params:
use_chunk: True
enable_distributed_storage: True
placement_policy: auto
force_outputs_fp32: true
log_every_n_steps: 2
logger: True
default_root_dir: "/tmp/diff_log/"
# profiler: pytorch
logger_config:
wandb:
target: loggers.WandbLogger
params:
name: nowname
save_dir: "/tmp/diff_log/"
offline: opt.debug
id: nowname
I don't know what the problem is, I think training is normal, but reasoning is bad, at least to output something
what is your ckpt for training
@Fazziekey hello,I downloaded the model checkpoint from pretrained, as suggested in the examples
git lfs install
git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
An error is reported when I add this parameter
···
from_pretrained: '/home/project/ColossalAI/examples/images/diffusion/stable-diffusion-v1-4/vae/diffusion_pytorch_model.bin'
···
I also suspected that the pre-training CKPT file might not have been read, but I didn't know where to add this configuration
@Fazziekey hello,I downloaded the model checkpoint from pretrained, as suggested in the examples
git lfs install git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
An error is reported when I add this parameter ··· from_pretrained: '/home/project/ColossalAI/examples/images/diffusion/stable-diffusion-v1-4/vae/diffusion_pytorch_model.bin' ···
I also suspected that the pre-training CKPT file might not have been read, but I didn't know where to add this configuration
![]()
Thanks for your issue, we are updating our code to stable diffusion v2, the from pretrain args is removed in v2
Well, thank you for your work. Expect you to update the code and steps to the examples.
Well, thank you for your work. Expect you to update the code and steps to the examples.
Thanks for your outstanding, There are more bug and problem in stable diffusion v2, we will offer a stable train version as soon as we can