MONAI Add warning to documentation that warp in nearest neighbor mode is non differentiable

Is your feature request related to a problem? Please describe. I spend several days figuring out why my spatial transformer network using a warp layer did not properly train. In the end, the reason is that when using the nearest neighbor mode for the warp layer, the result is non-differentiable.

Pytorch does not give any warnings of this and it is also not mentioned in the documentation of grid_sample.

Describe the solution you'd like An addition to the documentation of the warp layer that warns users of the non-differentiability of the nearest neighbor mode.

Describe alternatives you've considered I do not really see any other alternatives, as this modification is so small.

Aug 15 '25 09:08 Tessel2000

Nearest-neighbor warping is technically differentiable but produces zero gradients almost everywhere, so it blocks gradient flow during training.

In the semi-supervised registration setting, the correct approach would be to one-hot encode the mask, use bilinear or trilinear warping, and compute the loss on the resulting soft maps.

I do agree it would be useful to add a warning in the documentation when mode="nearest". Beyond that, it could be helpful to have more helper flags for semi-supervised DIR settings to handle these cases automatically (e.g a warp_mask=True in VoxelMorph.forward , and potentially a coordinate transform for keypoints).

Happy to open a PR if this aligns with the maintainers’ direction.

Oct 17 '25 15:10 Kheil-Z

I would like to point out that MONAI has a tutorial on volumetric image registration that demonstrates the correct way of handling labels, i.e., convert to one-hot encoding, warp using mode='bilinear', and compute loss, which is exactly what @Kheil-Z described.

That said, I agree that the documentation of monai.networks.blocks.Warp can be improved. Regarding helper flags, I think it makes sense for VoxelMorph.forward to

Take both source and target intensity images (required) and their corresponding one-hot encoded labels (optional) as arguments.
Within forward, the intensity images are fed through the network, whereas both intensity images and one-hot encoded labels are warped using the same Warp layer with mode='bilinear'.

At this point, the user is free to choose their favorite loss function to be applied to the warped images.

As for your other idea on incorporating a coordinate transform for keypoints, I think that is also straightforward. And MONAI actually also has a tutorial for this. One would essentially

Interpolate the displacement field at keypoint locations using grid_sample, then
Add displacement to keypoint coordinates.

I believe this would be a meaningful contribution to MONAI for those working on DIR. And @Kheil-Z, I can work on this PR together with you if that sounds good to you (and if this aligns with maintainers' direction, of course). Let me know what you think. :)

-kaibo

Nov 07 '25 17:11 kvttt

Hi @kvttt ,

Thanks for the detailed suggestions! I fully agree:

Adding optional one-hot encoded labels to VoxelMorph.forward and warping them consistently with the intensity images aligns well with the semi-supervised registration workflow.
Handling keypoints via displacement interpolation is straightforward and would be a useful addition for deformable image registration tasks.

I’ve started a PR branch (voxelmorph-warp) with a placeholder commit in VoxelMorph.forward for the optional segmentation/keypoint support. We can use this branch as a shared workspace if you'd like. (:

Nov 12 '25 11:11 Kheil-Z