nerfstudio Question Regarding Image Downscaling Method

Question Regarding Image Downscaling Method

Open wzy-99 opened this issue 2 months ago • 3 comments

https://github.com/nerfstudio-project/nerfstudio/blob/9b3cbc79bf239eb3c69e7c288632aab02c4f0bb1/nerfstudio/models/splatfacto.py#L83

Why was the following method chosen for downscaling images instead of directly using F.resize?

def resize_image(image: torch.Tensor, d: int):
    """
    Downscale images using the same 'area' method in opencv

    :param image shape [H, W, C]
    :param d downscale factor (must be 2, 4, 8, etc.)

    return downscaled image in shape [H//d, W//d, C]
    """
    import torch.nn.functional as tf

    image = image.to(torch.float32)
    weight = (1.0 / (d * d)) * torch.ones((1, 1, d, d), dtype=torch.float32, device=image.device)
    return tf.conv2d(image.permute(2, 0, 1)[:, None, ...], weight, stride=d).squeeze(1).permute(1, 2, 0)

My concern is that this method may lead to misaligned coordinates. For instance, if we input an image of size 19x19 and downscale it by a factor of 4, the last 3 pixels would be left empty, whereas ideally, these 3 pixels should be evenly distributed in one row.

https://github.com/nerfstudio-project/nerfstudio/blob/9b3cbc79bf239eb3c69e7c288632aab02c4f0bb1/nerfstudio/data/dataparsers/colmap_dataparser.py#L460

Additionally, I noticed in another part of the code, linear interpolation (FFMPEG default) is used for image downsampling. Therefore, for code consistency, I believe the same interpolation method should be used during dataset preprocessing and training phase downsampling.

Jun 26 '24 11:06 wzy-99

nerfstudio nerfstudio copied to clipboard

Question Regarding Image Downscaling Method

nerfstudio
nerfstudio copied to clipboard