stylegan2-pytorch icon indicating copy to clipboard operation
stylegan2-pytorch copied to clipboard

train with augment

Open KunWang123 opened this issue 3 years ago • 6 comments

do I train the model with python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=PORT train.py --batch BATCH_SIZE LMDB_PATH --augment

KunWang123 avatar May 06 '21 14:05 KunWang123

Yes. python -m torch.distributed.launch --nproc_per_node=N_GPU --master_port=PORT train.py --batch BATCH_SIZE --augment LMDB_PATH

rosinality avatar May 08 '21 00:05 rosinality

Hi @rosinality, after turning on --augment, I got the following error:

File "/home/zhule.zhl/miniconda3/envs/py37_torch160/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/zhule.zhl/miniconda3/envs/py37_torch160/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/mnt3/zhule.zhl/gitWorks/stylegan2-pytorch/op/fused_act.py", line 101, in forward return fused_leaky_relu(input, self.bias, self.negative_slope, self.scale) File "/mnt3/zhule.zhl/gitWorks/stylegan2-pytorch/op/fused_act.py", line 119, in fused_leaky_relu return FusedLeakyReLUFunction.apply(input, bias, negative_slope, scale) File "/mnt3/zhule.zhl/gitWorks/stylegan2-pytorch/op/fused_act.py", line 66, in forward out = fused.fused_bias_act(input, bias, empty, 3, 0, negative_slope, scale) RuntimeError: input must be contiguous

Do you know why it happened? I couldn't fix it.

tearscoco avatar Aug 02 '21 07:08 tearscoco

@tearscoco I got the same error when training with augmentation

ElektrischesSchaf avatar Aug 03 '21 06:08 ElektrischesSchaf

@tearscoco @ElektrischesSchaf It will be fixed with bb459e0.

rosinality avatar Aug 04 '21 14:08 rosinality

@rosinality thank you, now it works perfectly

I also found that when training with --augment, this error appears

File "/workspace/gan-pytorch/stylegan2-pytorch-rosinality/non_leaking.py", line 316, in get_padding
   pad = pad.max(torch.tensor([0, 0] * 2, device=device))
RuntimeError: Expected object of scalar type float but got scalar type long int for argument 'other'

whereas training without --augment does not have this error. I'm not sure it's because I'm feeding a custom dataset into the model or not.

My solution is adding dtype=torch.float32 in function get_padding() from non_leaking.py

pad = pad.max(torch.tensor([0, 0] * 2, device=device, dtype=torch.float32))
pad = pad.min(torch.tensor([width - 1, height - 1] * 2, device=device, dtype=torch.float32))

ElektrischesSchaf avatar Aug 05 '21 01:08 ElektrischesSchaf

@ElektrischesSchaf You can fix this by changing into pad = pad.max(torch.tensor([0.0, 0.0] * 2, device=device))

tearscoco avatar Aug 05 '21 09:08 tearscoco