stylegan-v icon indicating copy to clipboard operation
stylegan-v copied to clipboard

Low FVD scores and generating inverted samples?

Open skymanaditya1 opened this issue 2 years ago • 10 comments

For comparisons, I am training StyleGan-V on a relatively smaller dataset of faces (faces from the how2sign dataset). In particular, I am training StyleGan-V on 10,000 videos and each video has exactly 25 frames. After training for a sufficient amount of time (when I started noticing really good perceptual results), I generated the inference, and the inference looks as follows. The first thing I observe is that the video generated is inverted. Now the orientation of the intermediate predicted/generated video keeps changing during training. Secondly, I measured the fvd2048_16f scores using this pretrained checkpoint against the dataset on which the model was trained, and I am getting a relatively very high fvd score of ~1100. Is this expected since the model is trained on a fewer number of samples, or if there is something wrong as the inferred videos are inverted? For training on the rest of the datasets, I am able to get videos in the correct orientation (ucf, skytimelapse, rainbow jelly). Also attached below is one frame extracted from the generated video (and the perceptually quality looks good to me).

one_frame

skymanaditya1 avatar May 18 '22 09:05 skymanaditya1

For the rainbow jelly dataset, the generated images have a white background whereas the real images have a black background. Please find some samples here -- fakes005961 reals

The training configuration is the same as described above. Did you also notice this at any point during training?

skymanaditya1 avatar May 18 '22 10:05 skymanaditya1

Hi! The inverted images are being generated due to the use of differentiable augmentations (from StyleGAN2-ADA). The white/black background happens for the same reason. If your generator produces the inverted ones, the FVD will be very high for sure.

Typically, one just needs to train for longer to get those diffaugs sorted out (you can check the StyleGAN2-ADA paper). For how many kimgs do you train?

A natural way to solve the issue would be to increase the dataset size, but i suspect it's not possible in your case. You can disable any specific augmentations here. It would remove the affect, but might make it more difficult for G to learn the data since D would be winning too severely.

For RainbowJelly — note please that it's not a symmetric dataset, so it makes sense to disable mirroring for it here if you want to obtain better results on it (we didn't do this in our case to be comparable with other methods).

universome avatar May 20 '22 23:05 universome

I see, I will follow this advice and retrain the network on all the datasets. I don't remember the exact number of k images that I trained the network for, but I trained each network on a 4 GPU setup of Nvidia 2080 GTX Tis for close to 2 days with the following batch configuration and resolution --

  1. How2sign_faces -- 256 x 256, 32 batch size
  2. Rainbow jelly -- 128x128, 64 batch size
  3. Skytimelapse - 128x128, 64 batch size

I did manually invert the videos in the case of predictions in how2sign_faces and used the cal_metrics_for_dataset.py for 100 generated videos and calculated the FVD -- it came to 297. As for SkyTimeLapse, I observed an FVD score of around 51 even on the smaller dataset and the images were perceptually the best as well.

I will try to retrain the models with the mentioned augmentations check turned off and re-report the metrics and inferences.

skymanaditya1 avatar May 22 '22 20:05 skymanaditya1

Ok, sounds good. Also note that computing FVD on a small amount of videos (100 instead of 2000) might lead to worse FVD values because it will think that you have mode collapse in your statistics and will penalize for that

universome avatar May 23 '22 23:05 universome

I will train with augmentations disabled on the smaller datasets in that case. I don't think generating and manually inverting 2048-generated videos would be a good idea.

skymanaditya1 avatar May 24 '22 19:05 skymanaditya1

@universome I tried running with augmentation disabled using the flag augpip: noaug, and I get an AssertionError - assert c.augpipe is None or c.augpipe in augpipe_specs. From what I understand, having noaug is disabled (I can remove the assertion). Are you expecting at least one augmentation to be specified as an input?

skymanaditya1 avatar May 25 '22 18:05 skymanaditya1

Would it be this particular option in the base.yaml file under configs/training? aug: noaug # One of ['noaug', 'ada', 'fixed']

Also what should be the disc augmentation to avoid the situation being talked about at the top?

Currently I am using noaug for aug: and bgc as augpipe: . I am inclined towards changing augpip in bgc to noise though. Please let me know what you think.

skymanaditya1 avatar May 25 '22 19:05 skymanaditya1

Hi! The question you are asking is somewhat difficult since it is difficult to predict how the model would perform with these or that augmentations. I believe that you would need some augmentations enabled to make your model fit a small dataset. If you want to disable augmentations completely, then you should specify aug: noaug, in which case augpipe parameter is neglected. If you want to only disable rotations, you should set rotate90=0 and rotate=0 for the bgc augmentation pipe here (or create your own augpipe, like we did for bgc_norgb).

How are your results going without any augmentations? If the model does not overfit, then you can disable them completely.

universome avatar May 30 '22 17:05 universome

So the augmentations specified in the "augpipe" parameter are applied. Supplying noaug to the "aug" parameter disables augmentations. There are however two other modes that can be supplied - 'ada', and 'fixed'. Do both of those augmentations also use the same augpipe parameter. I will also share the results with the "aug: noaug" parameter. Also while setting the aug parameter to noaug, I get an error that says raise UserError('--target can only be specified with --aug=ada'). Is specifying ada as the aug parameter necessary?

skymanaditya1 avatar Jun 01 '22 02:06 skymanaditya1

There are three possible choices for augmentations: 1) no augmentations (aug: noaug), 2) fixed augmentations (aug: fixed); and 3) adaptive augmentations (aug: ada). If you choose to have adaptive augmentations, then you can set target option for it (i.e. when to increase/decrease augmentations probability). However, if you choose other augmentation types, then you should disable it.

universome avatar Jun 07 '22 15:06 universome