diffusion
diffusion copied to clipboard
train error during evaluation with 1 GPU and train with multi GPU
Hi thanks for this contribution as a small exercise I am training SD2 on the pokemon dataset I precomputed the latents and it starts training on one gpu However at the evaluation time I get the following error
File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/composer/trainer/trainer.py", line 2814, in _eval_loop
self.state.outputs = self._original_model.eval_forward(self.state.batch)
File "/fsx_vfx/users/csegalin/code/diffusion/diffusion/models/stable_diffusion.py", line 255, in eval_forward
gen_images = self.generate(tokenized_prompts=prompts,
File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/fsx_vfx/users/csegalin/code/diffusion/diffusion/models/stable_diffusion.py", line 464, in generate
pred = self.unet(latent_model_input,
File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 934, in forward
sample = self.conv_in(sample)
File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/fsx_vfx/users/csegalin/code/diffusion/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Calculated padded input size per channel: (162 x 2). Kernel size: (3 x 3). Kernel size can't be greater than actual input size`
this is my confguration
name: trial0 # Insert wandb run name
project: pokemon_sd2_256 # Insert wandb project name
seed: 17
eval_first: false
algorithms:
low_precision_groupnorm:
attribute: unet
precision: amp_fp16
low_precision_layernorm:
attribute: unet
precision: amp_fp16
model:
_target_: diffusion.models.models.stable_diffusion_2
pretrained: false
precomputed_latents: true
encode_latents_in_fp16: true
fsdp: true
val_metrics:
- _target_: torchmetrics.MeanSquaredError
- _target_: torchmetrics.image.fid.FrechetInceptionDistance
normalize: true
val_guidance_scales: [3, 7]
# val_guidance_scales: []
loss_bins: []
dataset:
train_batch_size: 1 # Global training batch size
eval_batch_size: 1 # Global evaluation batch size
train_dataset:
_target_: diffusion.datasets.pokemon.pokemon.build_streaming_dataloader
# Path to object store bucket(s)
local: /fsx_vfx/users/csegalin/data/pokemon/latents2_train
# Path to corresponding local dataset(s)
mode: 0
version: 2
drop_last: False
shuffle: true
prefetch_factor: 2
num_workers: 8
persistent_workers: true
pin_memory: true
eval_dataset:
_target_: diffusion.datasets.pokemon.pokemon.build_streaming_dataloader
local: /fsx_vfx/users/csegalin/data/pokemon/latents2_eval # Path to local dataset cache
prefetch_factor: 2
num_workers: 8
persistent_workers: True
pin_memory: True
mode: 0
version: 2
optimizer:
_target_: torch.optim.AdamW
lr: 1.0e-5
weight_decay: 0.01
scheduler:
_target_: composer.optim.LinearWithWarmupScheduler
t_warmup: 1000ba
alpha_f: 1.0
logger:
comet-ml:
_target_: composer.loggers.cometml_logger.CometMLLogger
name: ${name}
project_name: ${project}
callbacks:
speed_monitor:
_target_: composer.callbacks.speed_monitor.SpeedMonitor
window_size: 10
lr_monitor:
_target_: composer.callbacks.lr_monitor.LRMonitor
memory_monitor:
_target_: composer.callbacks.memory_monitor.MemoryMonitor
runtime_estimator:
_target_: composer.callbacks.runtime_estimator.RuntimeEstimator
optimizer_monitor:
_target_: composer.callbacks.OptimizerMonitor
image_monitor:
_target_: diffusion.callbacks.log_diffusion_images.LogDiffusionImages
prompts: # add any prompts you would like to visualize
- cute dragon creature
size: 256 # generated image resolution
guidance_scale: 3
trainer:
_target_: composer.Trainer
device: gpu
max_duration: 550000ba
eval_interval: 1000ba
device_train_microbatch_size: 1
run_name: ${name}
seed: ${seed}
save_folder: trained_model # Insert path to save folder or bucket
save_interval: 3000ba
save_overwrite: true
autoresume: false
# fsdp_config:
# sharding_strategy: "SHARD_GRAD_OP"
``
I think this related to the FID metrics as if I remove it all works
when I try to train on a multi gpu machine (resetting fspd to true) and uncommenting last two lines of the config and batch size accordingly I get this error
ValueError: The world_size(2) > 1 but dataloader does not use DistributedSampler. This will cause all ranks to train on the same data, removing any benefit from multi-GPU training. To resolve this, create a Dataloader with DistributedSampler. For example, DataLoader(..., sampler=composer.utils.dist.get_sampler(...)).Alternatively, the process group can be instantiated with composer.utils.dist.instantiate_dist(...) and DistributedSampler can directly be created with DataLoader(..., sampler=DistributedSampler(...)). For more information, see https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler.
I don't see a distributesampler for the laion or coco functions