pytorch_convNd icon indicating copy to clipboard operation
pytorch_convNd copied to clipboard

Getting NaNs... manual initialization required?

Open kevinjohncutler opened this issue 2 years ago • 5 comments

@pvjosue I tried swapping this in for conv2D and conv3D in a Unet. I am getting the correct shape but all NaNs as output. Do kernel_initializer and bias_initializer need to be set manually?

kevinjohncutler avatar Jan 26 '23 03:01 kevinjohncutler

Yes, i recently observed this as well, and will investigate. My guess is there something changed in the latest pytorch release. Thanks

pvjosue avatar Jan 26 '23 07:01 pvjosue

@pvjosue Have you found a solution to this with the latest version of pytorch? Did they deprecate this ability in the latest release? This is exactly the type of package I was looking for. I would be willing to use an older version of pytorch to use this.

DavidRavnsborg avatar Feb 08 '23 07:02 DavidRavnsborg

Let's see, with user base initialization, it seems to work out of the box. GPU Ubuntu: image

CPU Ubuntu: image

The problem is the default initialization, which I could fix for convND: https://github.com/pvjosue/pytorch_convNd/commit/2b0e4dbd658f25a18efd8cd848f644c0e9eef29a I'll leave the issue open as a solution for the transposed conv is still missing. Thanks for letting me know :)

pvjosue avatar Feb 08 '23 12:02 pvjosue

For me, the culprit was the bias. In the initialization of convNd the bias

if use_bias:
    self.bias = nn.Parameter(torch.Tensor(out_channels))
else:
    self.register_parameter('bias', None)

According to my understanding, one should not use torch.Tensor to initialize a tensor as it might access unallocated or improperly initialized memory. A simple example

>>> import torch
>>> import torch.nn as nn
>>> bias = nn.Parameter(torch.Tensor(10))
>>> type(bias)
<class 'torch.nn.parameter.Parameter'>
>>> bias.data
tensor([-1.0836e+10,  2.6007e-36,  6.4800e+24,  4.5593e-41,  4.5549e+24,
         4.5593e-41,  1.8760e-16,         nan,  6.4629e+24,  4.5593e-41])

After switching it to

if use_bias:
    self.bias = nn.Parameter(torch.zeros(out_channels))
else:
    self.register_parameter('bias', None)

everything works as expected. It is probably not an optimal initialization, but if I understand correctly, a user can pass a suitable one via the bias_initializer keyword argument.

st0nedB avatar Mar 23 '23 10:03 st0nedB

I added a (very simple) fix. Please note it might not be optimal to use for everyone, but it suffices in my network. We can have discussions on this in the merge-request.

st0nedB avatar Mar 23 '23 10:03 st0nedB