MUNIT icon indicating copy to clipboard operation
MUNIT copied to clipboard

question about trainning

Open wanghxcis opened this issue 7 years ago • 10 comments

Hi guys, I collect some data image about winter and summer, and resize to 256*256. When I train these image, I find the vgg_w parameter is hard to tune. If this value is large, the output image quality is OK, but I can only see little translation effect, the output image is almost the same as input. However, when this value is small, the output image is blurry after 1,000,000 iterations. What should I do, enlarging mlp_dim or somthing else?

wanghxcis avatar Jun 12 '18 03:06 wanghxcis

From my understanding, the domain invariant perceptual loss should only be used for accelerating the training process for input >=512 X 512. My guess is that images of size 256 X 256 is to small.

Check page 8 of their paper which give more details.

Please the authors, can confirm my guess ?

Thanks,

OValery16 avatar Jul 13 '18 03:07 OValery16

@OValery16 Yes, we find that the domain invariant perceptual loss is useful for large size images. For image resolution of 256x256, we do not use domain invariant perceptual loss.

mingyuliutw avatar Jul 13 '18 17:07 mingyuliutw

@wanghxcis Could you send me some example images and training results?

mingyuliutw avatar Jul 14 '18 00:07 mingyuliutw

@mingyuliutw I read your paper and I got a bit confused. In which case do you use explicit style augmented cycle consistency loss ?

OValery16 avatar Jul 14 '18 04:07 OValery16

@mingyuliutw Thanks, I will get rid off the perceptual loss, and try again. My trainning parameters are as follows: image_save_iter: 1000 # How often do you want to save output images during training image_display_iter: 100 # How often do you want to display output images during training display_size: 8 # How many images do you want to display each time snapshot_save_iter: 10000 # How often do you want to save trained models log_iter: 1 # How often do you want to log the training stats

max_iter: 1000000 # maximum number of training iterations batch_size: 1 # batch size weight_decay: 0.0001 # weight decay beta1: 0.5 # Adam parameter beta2: 0.999 # Adam parameter init: kaiming # initialization [gaussian/kaiming/xavier/orthogonal] lr: 0.0001 # initial learning rate lr_policy: step # learning rate scheduler step_size: 100000 # how often to decay learning rate gamma: 0.5 # how much to decay learning rate gan_w: 1.5 # weight of adversarial loss recon_x_w: 9 # weight of image reconstruction loss recon_s_w: 1 # weight of style reconstruction loss recon_c_w: 1 # weight of content reconstruction loss recon_x_cyc_w: 1 # weight of explicit style augmented cycle consistency loss vgg_w: 0.6 # weight of domain-invariant perceptual loss gen: dim: 64 # number of filters in the bottommost layer mlp_dim: 256 # number of filters in MLP style_dim: 8 # length of style code activ: relu # activation function [relu/lrelu/prelu/selu/tanh] n_downsample: 2 # number of downsampling layers in content encoder n_res: 4 # number of residual blocks in content encoder/decoder pad_type: reflect # padding type [zero/reflect] dis: dim: 64 # number of filters in the bottommost layer norm: none # normalization layer [none/bn/in/ln] activ: lrelu # activation function [relu/lrelu/prelu/selu/tanh] n_layer: 3 # number of layers in D gan_type: lsgan # GAN loss [lsgan/nsgan] num_scales: 2 # number of scales pad_type: reflect # padding type [zero/reflect]

input_dim_a: 3 # number of image channels [1/3] input_dim_b: 3 # number of image channels [1/3] num_workers: 8 # number of data loading threads new_size: 256 # first resize the shortest image side to this size crop_image_height: 256 # random crop image of this height crop_image_width: 256 # random crop image of this width data_root: ./datasets/summer2winter_fineselect256/ # dataset folder location Trainning examples Result: gen_a2b_train_current

wanghxcis avatar Jul 16 '18 06:07 wanghxcis

Hi, does the domain-invariant perceptual loss affect the image quality for size 256*256?

zhangmozhe avatar Sep 01 '18 06:09 zhangmozhe

@mingyuliutw I read your paper and I got a bit confused. In which case do you use explicit style augmented cycle consistency loss ?

Hi,did you solve this problem? I also have the same confusion.

Lucky0775 avatar Sep 23 '22 07:09 Lucky0775

@mingyuliutw Thanks, I will get rid off the perceptual loss, and try again. My trainning parameters are as follows: image_save_iter: 1000 # How often do you want to save output images during training image_display_iter: 100 # How often do you want to display output images during training display_size: 8 # How many images do you want to display each time snapshot_save_iter: 10000 # How often do you want to save trained models log_iter: 1 # How often do you want to log the training stats

max_iter: 1000000 # maximum number of training iterations batch_size: 1 # batch size weight_decay: 0.0001 # weight decay beta1: 0.5 # Adam parameter beta2: 0.999 # Adam parameter init: kaiming # initialization [gaussian/kaiming/xavier/orthogonal] lr: 0.0001 # initial learning rate lr_policy: step # learning rate scheduler step_size: 100000 # how often to decay learning rate gamma: 0.5 # how much to decay learning rate gan_w: 1.5 # weight of adversarial loss recon_x_w: 9 # weight of image reconstruction loss recon_s_w: 1 # weight of style reconstruction loss recon_c_w: 1 # weight of content reconstruction loss recon_x_cyc_w: 1 # weight of explicit style augmented cycle consistency loss vgg_w: 0.6 # weight of domain-invariant perceptual loss gen: dim: 64 # number of filters in the bottommost layer mlp_dim: 256 # number of filters in MLP style_dim: 8 # length of style code activ: relu # activation function [relu/lrelu/prelu/selu/tanh] n_downsample: 2 # number of downsampling layers in content encoder n_res: 4 # number of residual blocks in content encoder/decoder pad_type: reflect # padding type [zero/reflect] dis: dim: 64 # number of filters in the bottommost layer norm: none # normalization layer [none/bn/in/ln] activ: lrelu # activation function [relu/lrelu/prelu/selu/tanh] n_layer: 3 # number of layers in D gan_type: lsgan # GAN loss [lsgan/nsgan] num_scales: 2 # number of scales pad_type: reflect # padding type [zero/reflect]

input_dim_a: 3 # number of image channels [1/3] input_dim_b: 3 # number of image channels [1/3] num_workers: 8 # number of data loading threads new_size: 256 # first resize the shortest image side to this size crop_image_height: 256 # random crop image of this height crop_image_width: 256 # random crop image of this width data_root: ./datasets/summer2winter_fineselect256/ # dataset folder location Trainning examples Result: gen_a2b_train_current

您好,我也遇到了相同的问题,请问您是怎么解决的? Hello, I also met the same problem, how did you solve it?

wylblank avatar Apr 09 '24 07:04 wylblank

Hi guys, I collect some data image about winter and summer, and resize to 256*256. When I train these image, I find the vgg_w parameter is hard to tune. If this value is large, the output image quality is OK, but I can only see little translation effect, the output image is almost the same as input. However, when this value is small, the output image is blurry after 1,000,000 iterations. What should I do, enlarging mlp_dim or somthing else?

您好,我也是将图像调整为256256,但在这种情况下如果vgg_w不为0,则会使损失为Nan,请问您有什么建议吗?感谢! Hello, I also adjust the image to 256256, but in this case, if vgg_w is not 0, the loss will be Nan, do you have any suggestions? Thank you very much!

wylblank avatar Apr 09 '24 09:04 wylblank

Hi guys, I collect some data image about winter and summer, and resize to 256*256. When I train these image, I find the vgg_w parameter is hard to tune. If this value is large, the output image quality is OK, but I can only see little translation effect, the output image is almost the same as input. However, when this value is small, the output image is blurry after 1,000,000 iterations. What should I do, enlarging mlp_dim or somthing else?

您好,我正在尝试复现MUNIT,但遇到了 for (src, dst) in zip(vgglua.parameters()[0], vgg.parameters()): TypeError: 'NoneType' object is not callable的错误,我尝试进行了修改但生成的vgg16.weight不起作用,vgg_w参数为nan,请问您是否遇到过?或者可以发我一下您models文件夹下的vgg16.weight给我一下吗?万分感谢 Hello, I am trying to reproduce MUNIT, but encountered for (src, dst) in zip(vgglua.parameters()[0], vgg.parameters()): TypeError: 'NoneType' object is not a callable error, I tried to modify it, but the generated vgg16.weight does not work, and the vgg_w parameter is nan. Have you ever encountered it? Or could you please send me vgg16.weight under your models folder? Thanks a million

wylblank avatar Apr 15 '24 01:04 wylblank