MONAI icon indicating copy to clipboard operation
MONAI copied to clipboard

LNCC loss instability for large kernel size

Open wyli opened this issue 3 years ago • 6 comments
trafficstars

Discussed in https://github.com/Project-MONAI/MONAI/discussions/3463

Originally posted by ebrahimebrahim December 9, 2021 I experienced some numerically unstable training while using LNCC loss to train a registration network. Exploding gradients, clipping them not helping, training loss increasing even with a low learning rate, etc.

This is in the context of synthetic data, where the images contained large homogeneous blobs, and so the variances used to compute LNCC were probably quite low.

I see that small variance has been a problem in the past and that it has been addressed. However, I seem to still have the issue when I set the kernel_size argument for LNCC loss to be large. Setting it back to a small value like 3 fixes the problem completely. Setting it to a large value (say, half the image size) leads to problems.

I think that giving a large kernel_size to LNCC is a natural thing to do, because NCC (normalized cross correlation, without the "local") is the case of having kernel_size equal image size, if I'm understanding correctly.

Is there any reason to expect that large kernel size would worsen the instability problem? From the code, it's not clear to me why the LNCC kernel size would matter. Thoughts, anyone?

wyli avatar Feb 14 '22 12:02 wyli

perhaps with large kernels, there's a problem with the zero padding here?https://github.com/Project-MONAI/MONAI/blob/817c1cf1648417ed2688d7f9f8c01f0cb3eecbcf/monai/networks/layers/simplelayers.py#L208 @ebrahimebrahim

wyli avatar Feb 14 '22 12:02 wyli

Copying this from the discussion: I have a notebook demonstrating the issue: https://github.com/ebrahimebrahim/deep-atlas/blob/main/lncc_with_large_kernel.ipynb

perhaps with large kernels, there's a problem with the zero padding here?

:thinking:

ebrahimebrahim avatar Feb 14 '22 23:02 ebrahimebrahim

thanks for the detailed demo, I did a quick change of the mode zeros to circular in separable_filtering

~it seems the loss becomes more reasonable~ grad size still doesn't look right.

wyli avatar Feb 14 '22 23:02 wyli

I would be interested in what's actually the stride when moving accross the image. And couldn't it be interesting passing it as parameter to the loss?

Constantin-Jehn avatar Jun 09 '22 09:06 Constantin-Jehn

I had similar problems for "big" kernel sizes.

I could overcome them by using the implementation from voxelmorph: https://github.com/voxelmorph/voxelmorph/blob/dev/voxelmorph/torch/losses.py

So maybe someone could compare to that code.

I would like to add that "overcoming" was too strongly stated. I also get into trouble with the sent Code, however at considerably higher learning rates and kernel sizes.

Constantin-Jehn avatar Jun 14 '22 15:06 Constantin-Jehn

@Constantin-Jehn This is a great observation, thank you! I hope to look into the difference in the two LNCC implementations when I get a chance.

ebrahimebrahim avatar Jun 14 '22 16:06 ebrahimebrahim

thanks for the discussion here, it's an issue handling the zero variance... I've included a patch in this PR to fix it https://github.com/Project-MONAI/MONAI/pull/5473.

@ebrahimebrahim I run your scripts and now the large kernels work fine. it's also quite efficient...

Screenshot 2022-11-04 at 17 42 59 lncc

wyli avatar Nov 04 '22 17:11 wyli