nerfstudio
nerfstudio copied to clipboard
Question Regarding Image Downscaling Method
https://github.com/nerfstudio-project/nerfstudio/blob/9b3cbc79bf239eb3c69e7c288632aab02c4f0bb1/nerfstudio/models/splatfacto.py#L83
Why was the following method chosen for downscaling images instead of directly using F.resize?
def resize_image(image: torch.Tensor, d: int):
"""
Downscale images using the same 'area' method in opencv
:param image shape [H, W, C]
:param d downscale factor (must be 2, 4, 8, etc.)
return downscaled image in shape [H//d, W//d, C]
"""
import torch.nn.functional as tf
image = image.to(torch.float32)
weight = (1.0 / (d * d)) * torch.ones((1, 1, d, d), dtype=torch.float32, device=image.device)
return tf.conv2d(image.permute(2, 0, 1)[:, None, ...], weight, stride=d).squeeze(1).permute(1, 2, 0)
My concern is that this method may lead to misaligned coordinates. For instance, if we input an image of size 19x19 and downscale it by a factor of 4, the last 3 pixels would be left empty, whereas ideally, these 3 pixels should be evenly distributed in one row.
https://github.com/nerfstudio-project/nerfstudio/blob/9b3cbc79bf239eb3c69e7c288632aab02c4f0bb1/nerfstudio/data/dataparsers/colmap_dataparser.py#L460
Additionally, I noticed in another part of the code, linear interpolation (FFMPEG default) is used for image downsampling. Therefore, for code consistency, I believe the same interpolation method should be used during dataset preprocessing and training phase downsampling.