vision Fill arg and _apply_grid_transform improvements

Few years ago we introduced non-const fill value handling in _apply_grid_transform using mask approach:

https://github.com/pytorch/vision/blob/0d69e35c4e951109dbaa8b42b0a8416d199aee0b/torchvision/transforms/functional_tensor.py#L550-L568

There are few minor problems with this approach:

if we pass fill = [0.0, ], we would expect to have a similar result as fill=None. This is not exactly true for bilinear interpolation mode where we do linear interpolation: https://github.com/pytorch/vision/blob/0d69e35c4e951109dbaa8b42b0a8416d199aee0b/torchvision/transforms/functional_tensor.py#L567-L568

Most probably, we would like to skip fill_img creation for all fill values that has sum(fill) == 0 as grid_sample pads with zeros.

- if fill is not None:
+ if fill is not None and sum(fill) > 0:

Linear fill_img and img interpolation may be replaced by directly applying a mask:

         mask = mask < 0.9999
         img[mask] = fill_img[mask]

That would match better PIL Image behaviour.

https://github.com/pytorch/vision/blob/0d69e35c4e951109dbaa8b42b0a8416d199aee0b/torchvision/transforms/functional_tensor.py#L567-L568

cc @datumbox

Aug 30 '22 12:08 vfdev-5

Since we have another report in #8083, do we want to tackle this? IMO, we should just align the two branches

https://github.com/pytorch/vision/blob/f69eee6108cd047ac8b62a2992244e9ab3c105e1/torchvision/transforms/v2/functional/_geometry.py#L588-L594

with something like

bool_mask = mask < 1
float_img[bool_mask] = fill_img.expand_as(float_img)[bool_mask]

This removes the blending and in turn the "shadow" for bilinear interpolation. Plus, this is equivalent for nearest interpolation, since the mask in that case only contains 0.0 and 1.0 entries.

Nov 07 '23 11:11 pmeier

@pmeier the value 0.9999 for mask was sort of on purpose. In the description example affine rotation by 50 degrees with bilinear mode creates a rotated mask with unique values:

tensor([0.00000000, 0.02883029, 0.02883148, 0.10955429, 0.10955477, 0.11125469,
         0.11125565, 0.19197845, 0.19197917, 0.19367909, 0.19367981, 0.27440262,
         0.27440357, 0.35512805, 0.35512924, 0.35682678, 0.35682797, 0.43755341,
         0.43755519, 0.43925095, 0.43925512, 0.51997960, 0.51998138, 0.60240537,
         0.60240555, 0.68312985, 0.68313217, 0.68482971, 0.68482977, 0.76555562,
         0.76555634, 0.76725388, 0.76725554, 0.84798002, 0.84798050, 0.92870331,
         0.92870587, 0.93040466, 0.93040580, 0.99999994, 1.00000000]))

and 0.99999994 can appear inside the mask:

plt.imshow(((mask > 0.999) & (mask < 1.0))[0, 0, ...], interpolation="none")

so, using mask < 1 gives:

Nov 07 '23 13:11 vfdev-5