Zenan Ling
Zenan Ling
> My conjecture is that the optimization step makes spectral norm larger than 1 and your code uses sigma calculated in the training phase to normalize it. It changes weight...
@shimazing yes
@jarrelscy Problems still exit without data parallel. Here is a toy example. import torch.nn as nn from SpectralNormGouk1 import * from torch.optim import * class toy(nn.Module): def __init__(self): super(toy, self).__init__()...
@jarrelscy Thanks for your reply.
@jarrelscy The test loss and accuracy seem to be normal if I use "net.train()" and "with with torch.no_grad()" during the test phase.
@shimazing @jarrelscy did you train the classification model?The author release the code in the latest version paper but the link is 404 now.
my classification net doesn’t work on single gpu the loss explodes