flows_ood
flows_ood copied to clipboard
RuntimeError('Scale factor has NaN entries')
Hi,
thank you for providing all the code and the hyperparameter specifications for the experiments. Unfortunately, I'm having issues reproducing your results for the Glow architecture with MNIST and FashionMNIST. When I run the provided command, the model starts training but stops after 15 epochs with RuntimeError("Scale factor has NaN entries'). Before the error, the loss increases up to 2.8e+12.
It would be great if you could find why it fails or tell me what I am doing wrong :slightly_smiling_face:
Traceback (most recent call last):
File "/nfs/homedirs/wildr/flows_ood/train_unsup.py", line 361, in <module>
train(epoch, net, trainloader, device, optimizer, loss_fn, args.max_grad_norm, writer,
File "/nfs/homedirs/wildr/flows_ood/train_unsup.py", line 68, in train
z = net(x)
File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.module(*inputs[0], **kwargs[0])
File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/nfs/homedirs/wildr/flows_ood/flow_ssl/glow/glow.py", line 24, in forward
return self.body(x)
File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/nfs/homedirs/wildr/flows_ood/flow_ssl/invertible/parts.py", line 121, in forward
return self.module1(x), self.module2(z)
File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/nfs/homedirs/wildr/flows_ood/flow_ssl/realnvp/coupling_layer.py", line 327, in forward
raise RuntimeError('Scale factor has NaN entries')
RuntimeError: Scale factor has NaN entries
Any updates about this error ?