flows_ood icon indicating copy to clipboard operation
flows_ood copied to clipboard

RuntimeError('Scale factor has NaN entries')

Open RaphaelWW opened this issue 3 years ago • 1 comments

Hi,

thank you for providing all the code and the hyperparameter specifications for the experiments. Unfortunately, I'm having issues reproducing your results for the Glow architecture with MNIST and FashionMNIST. When I run the provided command, the model starts training but stops after 15 epochs with RuntimeError("Scale factor has NaN entries'). Before the error, the loss increases up to 2.8e+12.

It would be great if you could find why it fails or tell me what I am doing wrong :slightly_smiling_face:

Traceback (most recent call last):
  File "/nfs/homedirs/wildr/flows_ood/train_unsup.py", line 361, in <module>
    train(epoch, net, trainloader, device, optimizer, loss_fn, args.max_grad_norm, writer,
  File "/nfs/homedirs/wildr/flows_ood/train_unsup.py", line 68, in train
    z = net(x)
  File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/nfs/homedirs/wildr/flows_ood/flow_ssl/glow/glow.py", line 24, in forward
    return self.body(x)
  File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/nfs/homedirs/wildr/flows_ood/flow_ssl/invertible/parts.py", line 121, in forward
    return self.module1(x), self.module2(z)
  File "/nfs/homedirs/wildr/anaconda3/envs/ood_flows/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/nfs/homedirs/wildr/flows_ood/flow_ssl/realnvp/coupling_layer.py", line 327, in forward
    raise RuntimeError('Scale factor has NaN entries')
RuntimeError: Scale factor has NaN entries

RaphaelWW avatar Aug 20 '21 19:08 RaphaelWW

Any updates about this error ?

AissamDjahnine avatar Jan 27 '23 11:01 AissamDjahnine