zero123
zero123 copied to clipboard
Training on a custom dataset
I appreciate your great work in zero123.
I want to retrain zero123 on medical data. My dataset contains about 700 samples, using the same data processing method as in the paper. Each sample has 12 different perspective images. I trained using a single A100 GPU. Below are my configuration file settings and training logs.
During inference, I used the saved 'last.ckpt' from training, but the generated image perspective control isn't very accurate. There's a deviation in perspective compared to the ground truth images. What could be the reason for this?
Could there be an issue with my configuration settings leading to a higher loss during training convergence?
model:
base_learning_rate: 0.0001
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.00085
linear_end: 0.012
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: image_target
cond_stage_key: image_cond
image_size: 32
channels: 4
cond_stage_trainable: false
conditioning_key: hybrid
monitor: val/loss_simple_ema
scale_factor: 0.18215
scheduler_config:
target: ldm.lr_scheduler.LambdaLinearScheduler
params:
warm_up_steps:
- 100
cycle_lengths:
- 10000000000000
f_start:
- 1.0e-06
f_max:
- 1.0
f_min:
- 1.0
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32
in_channels: 8
out_channels: 4
model_channels: 320
attention_resolutions:
- 4
- 2
- 1
num_res_blocks: 2
channel_mult:
- 1
- 2
- 4
- 4
num_heads: 8
use_spatial_transformer: true
transformer_depth: 1
context_dim: 768
use_checkpoint: true
legacy: false
first_stage_config:
target: ldm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.modules.FrozenCLIPImageEmbedder
data:
target: ldm.data.simple.ObjaverseDataModuleFromConfig
params:
root_dir: /data/yisi/mywork/zero123/diffusioncardiac
batch_size: 16
num_workers: 8
total_view: 4
train:
validation: false
image_transforms:
size: 256
validation:
validation: true
image_transforms:
size: 256
lightning:
find_unused_parameters: false
metrics_over_trainsteps_checkpoint: true
modelcheckpoint:
params:
every_n_train_steps: 5000
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 500
max_images: 32
increase_log_steps: false
log_first_step: true
log_images_kwargs:
use_ema_scope: false
inpaint: false
plot_progressive_rows: false
plot_diffusion_rows: false
'N': 32
unconditional_guidance_scale: 3.0
unconditional_guidance_label:
- ''
trainer:
benchmark: true
val_check_interval: 5000000
num_sanity_val_steps: 0
accumulate_grad_batches: 1
accelerator: ddp
check_val_every_n_epoch: 10
gpus: 0,
I have the same problem, how to fix it?
I am commenting for visibility since I am also having a related issue.
can you help me how to prepare my own data to train?what data structure should I process?very thanks!!!
I am commenting for visibility since I am also having a related issue.
can you help me how to prepare my own data to train?what data structure should I process?very thanks!!!
@gitsawww Do you know how to use custom data to train now?
@ys830 @trltzy @gitsawww @kada0720 Could you please assist me with preparing my data for training? What data structure should I be processing? Thank you.
total view should be 12 since you prepared 12 different images for each object
@ys830 @trltzy @gitsawww @kada0720 Could you please assist me with preparing my data for training? What data structure should I be processing? Thank you.
Have you solved this problem? I also have the same question