BigGAN-PyTorch Training results(IS and FID) are not good as yours with same training process

Hi ajbrock, I was running the training code on ImageNet by using default script launch_BigGAN_bs256x8.sh. It has finished 134k iterations and here is the log file. Screen Shot 2019-05-16 at 9 33 58 AM

Compare with the log file that you released, I got the worse results. I kept all the parameters as same as your default settings. The training is on 8xV100. Do you have any suggestion to make it better? Or what should I check to get a similar result as yours?

Thanks a lot!

May 16 '19 15:05 qilimk

Hi Qi,

There can be a substantial amount of variance in the time to convergence for a model (I only had time to train one with this codebase as I don't have unfettered access to that kind of compute) so it's not surprising that yours might need longer to converge/collapse--it appears to still be training.

I'd say let it run and see what IS/FID it gets to when it explodes and dies. This would also be a helpful datapoint for this repo to start getting a better sense of the variance in #itrs required =); if you wouldn't mind posting the full logfile (e.g. in a pastebin) I can take a look at them and check for any anomalies.

May 16 '19 16:05 ajbrock

Hi ajbrock, Appreciate your reply. I will continue the training to see what will happen. Of course, I would like to share my whole training log files on Google Drive(log files). If you could help to check the log files, that will be awesome.

BTW, there was a shut down during my training process since our GPUs lose the power. Could it be a reason that I didn't get the same performance as yours at similar iterations? Is it possible that you release a script which is training a 256x256 models(I plan to do it) or Is there any special notifications for training a 256x256 model?

Many thanks to you!

May 16 '19 23:05 qilimk

Hmm, Looking at your logs against () this does look like it's well outside the variance I would expect. My models are also trained on 8xV100 so I don't think hardware is the issue here. When I trained my models, I had to pause training around 18-20 times (I'm only allowed to run jobs for 24 hour at a time, so across 2+ weeks, you can imagine how many times I had to relaunch!) so I'm pretty confident in the checkpointing, but there could be something that doesn't stack up on different systems.

Keeping on with debugging:

What version, exactly, of PyTorch, CUDA, and cuDNN are you using?
Can you post a screenshot of both the script you used to launch the training process, and the script you used to resume?

The main possibility that currently comes to mind is that something might be up with your copy of ImageNet, but that would be somewhat odd (the main thing would be to check that you have the full 1.2M image dataset and confirm that all class folders have ~600-1300 images in them).

As to 256x256 it should be straightforward to just prepare the 256x256 dataset (or not use the hdf5) and run with I256 or I256_hdf5 as the dataset, though be aware that on only 8xV100 you may need to turn down the batch size and turn up the # accumulations to keep memory costs palatable.

May 19 '19 09:05 ajbrock

Hi Andy, Thanks for helping me check the log files. My system setting is as below:

pytorch: 1.0.1
cuda: 9.0
cudnn: 7.1.2
python: 3.6.8

I used the same scripts to run the code as yours except removing --load_in_mem arg. Here are the screenshots of my used scripts(launch the program, resume the program ).I also checked the training set (ILSVRC2012) which I used. Every class has 730+ images at least.

BTW, I have stopped the training process since there was no sign that it became better. I redo it again and hope it works well this time. I should take advantage of your released log files and check mine every day to make sure that the training process works as expected. Here is the plotting of my second training and the comparison with your logs and my previous log. It looks a little better than last time but not good as yours. I hope it could become better for next few days. new_traing_plotting_IS_mean_2019-05-21

Best Qi

May 20 '19 19:05 qilimk

That plot looks to me like EMA isn't kicking in, or that the batchnorm stats being used alongside the EMA are stale. Can you pull the FID and IS stats for the model you've trained out to 134k iterations using the following settings?

The model with EMA weights, in training mode (and the biggest per-GPU batch size you can manage)
The model with non-EMA weights, in training mode
The model with non-ema weights, in test mode

You should be able to do this easily by modifying the sampling script

May 22 '19 06:05 ajbrock

Thanks Andy!

I've run your sampling script on the model that I had trained up to 147K iterations to get IS and FID scores.

Here is the full log. In short, I called your sampling script with G_eval_mode=False/True and use_ema=False/True to produce 4 different FID/IS sets as summarized in the table below. These numbers are at noise_variance = 1.0.

	Training mode with EMA	Training mode without EMA	Test mode without EMA	Test mode with EMA
IS	62.866 +/- 1.269	23.864 +/- 0.505	24.270 +/- 0.544	57.521 +/- 1.852
FID	18.8229	41.8047	42.1819	20.8808

I believe I've been plotting the FID/IS for the EMA weights in test mode.

FYI. Here is a plot of my 2nd training attempt (simply re-running the same script on the same hardware 8 x V-100). The 2nd attempt (red) appears better than the 1st attempt (green) but still not as good as yours (blue). new_traing_plotting_IS_mean_2019-05-27

Could you help me to check it? Thank you very much for your insights!

Best Qi

May 27 '19 18:05 qilimk

Hi Andy！ I also did the experiment,and I don't change any parameters, but my experiment result was not as good as yours, the IS value was only 51，when hen I trained 140,000 steps。

May 30 '19 07:05 feifei-Liu

Hi feifei-Liu,

Thanks, this is helpful! Can you also post a link to your training logs, and the script you launched with? I'm working on tracking down any possible differences or bugs that might be leading to this.

@qilimk This still looks as though the EMA isn't kicking in somehow, but I'm digging in to see if I can spot any bugs or differences that might be causing this. One additional request: can you load the very first image from your I128_hdf5 as a .npy file and upload it as well? I want to make sure the preprocessing for my version of the dataset is the same.

May 31 '19 09:05 ajbrock

Sorry, I was running in a cluster for the first time and the results on the machine were not saved after termination . I only saved the results of the first 50,000 steps, and my FID value was of no reference value because I had done 100 categories of experiments before and I did not modify the i128_inception_moments.npz file . I didn't change the script you provided, my experiment ran on 8 M40 gpus, but I think GPU should not be a problem.Based on my previous observations, like qilimk, my results began to widen the gap with you after 20,000 steps.I tried to train with the cifar script you provided, and my IS value only reached 8.28. I don't know whether this result IS normal or not, and the effect of generating pictures IS not very good.I agree with you that there may be something wrong with EMA.In addition, I am wondering if we can make more experiments with cifar data set to see if it is the network problem.I also found that the picture of the training set is not using argument.

Jun 05 '19 10:06 feifei-Liu

Hi @ajbrock , I loaded the first image from my I128_hdf5 and save as a .npy file.

I also found a note that upsample may induce nondeterministic behavior when using CUDA in Pytorh document. Do you think that's why I couldn't reproduce the results?

Thanks a lot!

Jun 13 '19 19:06 qilimk

Hi @feifei-Liu , I also trained the model on cifar10 and got IS=8.32, FID=6.45. What's your FID score on cifar10? It is really hard to compare with other papers on the performance of cifar10, as in other papers they use an unsupervised training setting, while here we use a cGAN setting for training.

Jun 20 '19 23:06 songyh10

Hi @qilimk , similar to your experiments, I also found a gap in the performance between training mode and testing mode (G_eval_mode=true/false). Do you know why there is such a gap and what mode should we use to report the score officially?

Jun 20 '19 23:06 songyh10

Hi @ajbrock,

I have been working to run the BigGAN code on another image data set which I created. The data set consists of two classes with 256x256 sized images. I tried to retune the training parameters due to the fact that I am using a single Titan XP (~12 GB RAM). First, I read your suggestions about previous memory issues. Then, I applied following modifications to training arguments:

remove the 'load_in_mem' argument
decrease the batch size
increase the number of accumulations(num_G_accumulations and num_D_accumulations) to overcome the small batch size problem
decrease the channel multipliers(G_ch and D_ch) but still try to keep it in high values
add '--which_best FID' argument
small modifications showed in the attached script

You can compare your parameters with mine here. During whole training, my IS and FID values were terrible. The results were getting better slowly but then getting worse as you can see in the log file. Could you help me please to solve the problem?

Sep 03 '19 11:09 damlasena

hi guys, how to handle the above issues? do you reproduce the released results.

Mar 13 '21 02:03 KK666-AI

BigGAN-PyTorch BigGAN-PyTorch copied to clipboard

Training results(IS and FID) are not good as yours with same training process

BigGAN-PyTorch
BigGAN-PyTorch copied to clipboard