Can't use stylegan2 256x256 models
Describe the bug I wanted to use transfer learning on 256x256 pkl and my data contains images of 256x256 yet I got this error.
Training options:
{
"G_kwargs": {
"class_name": "training.networks_stylegan2.Generator",
"z_dim": 256,
"w_dim": 256,
"mapping_kwargs": {
"num_layers": 8,
"freeze_layers": 0,
"freeze_embed": false
},
"channel_base": 32768,
"channel_max": 256,
"fused_modconv_default": "inference_only"
},
"D_kwargs": {
"class_name": "training.networks_stylegan2.Discriminator",
"block_kwargs": {
"freeze_layers": 0
},
"mapping_kwargs": {},
"epilogue_kwargs": {
"mbstd_group_size": 4
},
"channel_base": 32768,
"channel_max": 256
},
"G_opt_kwargs": {
"class_name": "torch.optim.Adam",
"betas": [
0,
0.99
],
"eps": 1e-08,
"lr": 0.002
},
"D_opt_kwargs": {
"class_name": "torch.optim.Adam",
"betas": [
0,
0.99
],
"eps": 1e-08,
"lr": 0.002
},
"loss_kwargs": {
"class_name": "training.loss.StyleGAN2Loss",
"r1_gamma": 16.0,
"style_mixing_prob": 0.9,
"pl_weight": 2,
"pl_no_weight_grad": true,
"blur_init_sigma": 0
},
"data_loader_kwargs": {
"pin_memory": true,
"prefetch_factor": 2,
"num_workers": 3
},
"training_set_kwargs": {
"class_name": "training.dataset.ImageFolderDataset",
"path": "./datasets/FH.zip",
"use_labels": false,
"max_size": 4592,
"xflip": false,
"yflip": false,
"resolution": 256,
"random_seed": 0
},
"num_gpus": 2,
"batch_size": 16,
"batch_gpu": 8,
"metrics": [],
"total_kimg": 25000,
"resume_kimg": 360,
"kimg_per_tick": 4,
"network_snapshot_ticks": 10,
"image_snapshot_ticks": 10,
"snap_res": "4k",
"random_seed": 0,
"ema_kimg": 5.0,
"G_reg_interval": 4,
"augment_kwargs": {
"class_name": "training.augment.AugmentPipe",
"xflip": 1,
"rotate90": 1,
"xint": 1,
"scale": 1,
"rotate": 1,
"aniso": 1,
"xfrac": 1,
"brightness": 1,
"contrast": 1,
"lumaflip": 1,
"hue": 1,
"saturation": 1
},
"ada_target": 0.6,
"resume_pkl": "https://nvlabs-fi-cdn.nvidia.com/stylegan2/networks/stylegan2-cat-config-f.pkl",
"ada_kimg": 100,
"ema_rampup": null,
"run_dir": "./results/00004-stylegan2-FH-gpus2-batch16-gamma16-resume_lsuncat256"
}
Output directory: ./results/00004-stylegan2-FH-gpus2-batch16-gamma16-resume_lsuncat256
Number of GPUs: 2
Batch size: 16 images
Training duration: 25000 kimg
Dataset path: ./datasets/FH.zip
Dataset size: 4592 images
Dataset resolution: 256
Dataset labels: False
Dataset x-flips: False
Dataset y-flips: False
Launching processes...
Loading training set...
Num images: 4592
Image shape: [3, 256, 256]
Label shape: [0]
Downloading https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/stylegan2-ffhq-256x256.pkl ... done
Traceback (most recent call last):
File "train.py", line 369, in <module>
main() # pylint: disable=no-value-for-parameter
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "train.py", line 362, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "train.py", line 94, in launch_training
torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus)
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/kaggle/working/stylegan3-fun/train.py", line 50, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "/kaggle/working/stylegan3-fun/training/training_loop.py", line 163, in training_loop
misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
File "/kaggle/working/stylegan3-fun/torch_utils/misc.py", line 162, in copy_params_and_buffers
tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (256) must match the size of tensor b (512) at non-singleton dimension 0
I believe that the model that is shown in the log above is 512x512. Instead you need to use a 256x256 model, such as: https://api.ngc.nvidia.com/v2/models/org/nvidia/team/research/stylegan2/1/files?redirect=true&path=stylegan2-ffhq-256x256.pkl
Also when training a 256x256 model then be sure to include the following attribute in your training parameters.
--cbase=16384
I run !python train.py --outdir=./results --cbase=16384 --snap=10 --img-snap=10 --cfg=stylegan2 --data=./datasets/FH.zip --augpipe=bgc --gpus=2 --metrics=None --gamma=12 --batch=16 --resume='https://api.ngc.nvidia.com/v2/models/org/nvidia/team/research/stylegan2/1/files?redirect=true&path=stylegan2-ffhq-256x256.pkl'
and got the same issue
I realize that I gave you that URL for the 256x256 model, but it's not a valid download link.
Try the code below (as documented here).
!python train.py --outdir=./results --cbase=16384 --snap=10 --img-snap=10 --cfg=stylegan2 --data=./datasets/FH.zip --augpipe=bgc --gpus=2 --metrics=None --gamma=12 --batch=16 --resume=ffhq256
The error comes from the dimensionality in the latent space, as you have at the top of your configuration: "G_kwargs": {..., "z_dim": 256, "w_dim": 256, ...}. This is bizarre, as we set up the correct dimensionality here (and is the one that the pre-trained models are using). Perhaps there's somewhere else these values are being changed, but I'll have to look into it as the train.py file only changes this value for --cfg=stylegan2-ext.
I changed "z_dim" and "w_dim" to 256, thinking it might help but it didn't. However, I believe the problem was with the dataset I used where for some reason some images were not in 256x256 size. I added if img.size != (256, 256): img = img.resize((256, 256))
and it fixed it. Although torchvision transforms.RandomCrop(size=256) should have made sure all images are in 256x256 resolution, it didn't.
Yeah you need to exactly match the model you are finetuning from, otherwise there's no way to use the weights. For the reshaping of your data, do you mean you used dataset_tool.py and it still resulted in images of different size, or do you have another pipeline there?
Actually, it seems the reason it was fixed was thanks to adding --cbase=16384
Indeed, in my experience the --cbase=16384 is required when fine-tuning a 256x256 model. Otherwise it will throw an error prompt.