BigGAN-PyTorch icon indicating copy to clipboard operation
BigGAN-PyTorch copied to clipboard

FID is nan

Open doantientai opened this issue 6 years ago • 4 comments
trafficstars

Hi guys, thank you for the amazing work!

I have a trouble with FID. During the training, many times the log shows FID is nan. It does not happen all the time but more than a half of them.

So I wonder in which cases FID can be nan? And what can I do to prevent it?

Thank you very much!

doantientai avatar Oct 31 '19 10:10 doantientai

Hi @doantientai, the FID value is a sort of difference between two Gaussians and comes from Frechet Distance computation. The reason for NaN FID value could be small number of samples (<2048). You can see two different frechet distance computation methods in inception_utils.py . As default it uses the 'torch' method. So try the 'numpy' method instead of 'torch'. I hope it works.

damlasena avatar Nov 03 '19 12:11 damlasena

Thank you @damlasena, I switched to numpy method and it works. I also think that the line 306 in inception_utils.py should be: FID = numpy_calculate_frechet_distance(mu, sigma, data_mu, data_sigma) instead of: FID = numpy_calculate_frechet_distance(mu.cpu().numpy(), sigma.cpu().numpy(), data_mu, data_sigma) because in the line 299, mu and sigma are already converted to numpy arrays

doantientai avatar Nov 04 '19 21:11 doantientai

Hi, @doantientai @damlasena. Is there anyone who finds that numpy_calculate_frechet_distance will cost a lot of time? For example, when I want to vilify the model performance, it cost me almost 1 day or more to calculate the metric. I don't know why I cost so much time! So, someone who can help me fix this problem? Thanks!

LonglongaaaGo avatar Apr 20 '21 00:04 LonglongaaaGo

BTW, I use the V100 32GB device with 8 CPUs

LonglongaaaGo avatar Apr 20 '21 00:04 LonglongaaaGo