stylegan3-fun Can't use stylegan2 256x256 models

Describe the bug I wanted to use transfer learning on 256x256 pkl and my data contains images of 256x256 yet I got this error.

Training options:
{
  "G_kwargs": {
    "class_name": "training.networks_stylegan2.Generator",
    "z_dim": 256,
    "w_dim": 256,
    "mapping_kwargs": {
      "num_layers": 8,
      "freeze_layers": 0,
      "freeze_embed": false
    },
    "channel_base": 32768,
    "channel_max": 256,
    "fused_modconv_default": "inference_only"
  },
  "D_kwargs": {
    "class_name": "training.networks_stylegan2.Discriminator",
    "block_kwargs": {
      "freeze_layers": 0
    },
    "mapping_kwargs": {},
    "epilogue_kwargs": {
      "mbstd_group_size": 4
    },
    "channel_base": 32768,
    "channel_max": 256
  },
  "G_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "betas": [
      0,
      0.99
    ],
    "eps": 1e-08,
    "lr": 0.002
  },
  "D_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "betas": [
      0,
      0.99
    ],
    "eps": 1e-08,
    "lr": 0.002
  },
  "loss_kwargs": {
    "class_name": "training.loss.StyleGAN2Loss",
    "r1_gamma": 16.0,
    "style_mixing_prob": 0.9,
    "pl_weight": 2,
    "pl_no_weight_grad": true,
    "blur_init_sigma": 0
  },
  "data_loader_kwargs": {
    "pin_memory": true,
    "prefetch_factor": 2,
    "num_workers": 3
  },
  "training_set_kwargs": {
    "class_name": "training.dataset.ImageFolderDataset",
    "path": "./datasets/FH.zip",
    "use_labels": false,
    "max_size": 4592,
    "xflip": false,
    "yflip": false,
    "resolution": 256,
    "random_seed": 0
  },
  "num_gpus": 2,
  "batch_size": 16,
  "batch_gpu": 8,
  "metrics": [],
  "total_kimg": 25000,
  "resume_kimg": 360,
  "kimg_per_tick": 4,
  "network_snapshot_ticks": 10,
  "image_snapshot_ticks": 10,
  "snap_res": "4k",
  "random_seed": 0,
  "ema_kimg": 5.0,
  "G_reg_interval": 4,
  "augment_kwargs": {
    "class_name": "training.augment.AugmentPipe",
    "xflip": 1,
    "rotate90": 1,
    "xint": 1,
    "scale": 1,
    "rotate": 1,
    "aniso": 1,
    "xfrac": 1,
    "brightness": 1,
    "contrast": 1,
    "lumaflip": 1,
    "hue": 1,
    "saturation": 1
  },
  "ada_target": 0.6,
  "resume_pkl": "https://nvlabs-fi-cdn.nvidia.com/stylegan2/networks/stylegan2-cat-config-f.pkl",
  "ada_kimg": 100,
  "ema_rampup": null,
  "run_dir": "./results/00004-stylegan2-FH-gpus2-batch16-gamma16-resume_lsuncat256"
}

Output directory:    ./results/00004-stylegan2-FH-gpus2-batch16-gamma16-resume_lsuncat256
Number of GPUs:      2
Batch size:          16 images
Training duration:   25000 kimg
Dataset path:        ./datasets/FH.zip
Dataset size:        4592 images
Dataset resolution:  256
Dataset labels:      False
Dataset x-flips:     False
Dataset y-flips:     False

Launching processes...
Loading training set...

Num images:  4592
Image shape: [3, 256, 256]
Label shape: [0]
Downloading https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/stylegan2-ffhq-256x256.pkl ... done
Traceback (most recent call last):
  File "train.py", line 369, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "train.py", line 362, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "train.py", line 94, in launch_training
    torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus)
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/kaggle/working/stylegan3-fun/train.py", line 50, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/kaggle/working/stylegan3-fun/training/training_loop.py", line 163, in training_loop
    misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
  File "/kaggle/working/stylegan3-fun/torch_utils/misc.py", line 162, in copy_params_and_buffers
    tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (256) must match the size of tensor b (512) at non-singleton dimension 0

Nov 19 '24 11:11 NitayGitHub

I believe that the model that is shown in the log above is 512x512. Instead you need to use a 256x256 model, such as: https://api.ngc.nvidia.com/v2/models/org/nvidia/team/research/stylegan2/1/files?redirect=true&path=stylegan2-ffhq-256x256.pkl

Also when training a 256x256 model then be sure to include the following attribute in your training parameters. --cbase=16384

Nov 19 '24 15:11 nuclearsugar

I run !python train.py --outdir=./results --cbase=16384 --snap=10 --img-snap=10 --cfg=stylegan2 --data=./datasets/FH.zip --augpipe=bgc --gpus=2 --metrics=None --gamma=12 --batch=16 --resume='https://api.ngc.nvidia.com/v2/models/org/nvidia/team/research/stylegan2/1/files?redirect=true&path=stylegan2-ffhq-256x256.pkl'

and got the same issue

Nov 19 '24 16:11 NitayGitHub

I realize that I gave you that URL for the 256x256 model, but it's not a valid download link.

Try the code below (as documented here).

!python train.py --outdir=./results --cbase=16384 --snap=10 --img-snap=10 --cfg=stylegan2 --data=./datasets/FH.zip --augpipe=bgc --gpus=2 --metrics=None --gamma=12 --batch=16 --resume=ffhq256

Nov 19 '24 17:11 nuclearsugar

The error comes from the dimensionality in the latent space, as you have at the top of your configuration: "G_kwargs": {..., "z_dim": 256, "w_dim": 256, ...}. This is bizarre, as we set up the correct dimensionality here (and is the one that the pre-trained models are using). Perhaps there's somewhere else these values are being changed, but I'll have to look into it as the train.py file only changes this value for --cfg=stylegan2-ext.

Nov 21 '24 19:11 PDillis

I changed "z_dim" and "w_dim" to 256, thinking it might help but it didn't. However, I believe the problem was with the dataset I used where for some reason some images were not in 256x256 size. I added if img.size != (256, 256): img = img.resize((256, 256))

and it fixed it. Although torchvision transforms.RandomCrop(size=256) should have made sure all images are in 256x256 resolution, it didn't.

Nov 21 '24 20:11 NitayGitHub

Yeah you need to exactly match the model you are finetuning from, otherwise there's no way to use the weights. For the reshaping of your data, do you mean you used dataset_tool.py and it still resulted in images of different size, or do you have another pipeline there?

Nov 21 '24 21:11 PDillis

Actually, it seems the reason it was fixed was thanks to adding --cbase=16384

Nov 25 '24 20:11 NitayGitHub

Indeed, in my experience the --cbase=16384 is required when fine-tuning a 256x256 model. Otherwise it will throw an error prompt.

Nov 25 '24 21:11 nuclearsugar