Fill arg and _apply_grid_transform improvements
Few years ago we introduced non-const fill value handling in _apply_grid_transform using mask approach:
https://github.com/pytorch/vision/blob/0d69e35c4e951109dbaa8b42b0a8416d199aee0b/torchvision/transforms/functional_tensor.py#L550-L568
There are few minor problems with this approach:
- if we pass
fill = [0.0, ], we would expect to have a similar result asfill=None. This is not exactly true for bilinear interpolation mode where we do linear interpolation: https://github.com/pytorch/vision/blob/0d69e35c4e951109dbaa8b42b0a8416d199aee0b/torchvision/transforms/functional_tensor.py#L567-L568
Most probably, we would like to skip fill_img creation for all fill values that has sum(fill) == 0 as grid_sample pads with zeros.
- if fill is not None:
+ if fill is not None and sum(fill) > 0:
- Linear
fill_imgandimginterpolation may be replaced by directly applying a mask:
mask = mask < 0.9999
img[mask] = fill_img[mask]
That would match better PIL Image behaviour.
https://github.com/pytorch/vision/blob/0d69e35c4e951109dbaa8b42b0a8416d199aee0b/torchvision/transforms/functional_tensor.py#L567-L568

cc @datumbox
Since we have another report in #8083, do we want to tackle this? IMO, we should just align the two branches
https://github.com/pytorch/vision/blob/f69eee6108cd047ac8b62a2992244e9ab3c105e1/torchvision/transforms/v2/functional/_geometry.py#L588-L594
with something like
bool_mask = mask < 1
float_img[bool_mask] = fill_img.expand_as(float_img)[bool_mask]
This removes the blending and in turn the "shadow" for bilinear interpolation. Plus, this is equivalent for nearest interpolation, since the mask in that case only contains 0.0 and 1.0 entries.
@pmeier the value 0.9999 for mask was sort of on purpose. In the description example affine rotation by 50 degrees with bilinear mode creates a rotated mask with unique values:
tensor([0.00000000, 0.02883029, 0.02883148, 0.10955429, 0.10955477, 0.11125469,
0.11125565, 0.19197845, 0.19197917, 0.19367909, 0.19367981, 0.27440262,
0.27440357, 0.35512805, 0.35512924, 0.35682678, 0.35682797, 0.43755341,
0.43755519, 0.43925095, 0.43925512, 0.51997960, 0.51998138, 0.60240537,
0.60240555, 0.68312985, 0.68313217, 0.68482971, 0.68482977, 0.76555562,
0.76555634, 0.76725388, 0.76725554, 0.84798002, 0.84798050, 0.92870331,
0.92870587, 0.93040466, 0.93040580, 0.99999994, 1.00000000]))
and 0.99999994 can appear inside the mask:
plt.imshow(((mask > 0.999) & (mask < 1.0))[0, 0, ...], interpolation="none")
so, using mask < 1 gives: