torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

RasterDataset: add control over resampling algorithm

Open adamjstewart opened this issue 10 months ago • 8 comments

By default, rasterio.merge.merge uses "nearest" as its resampling algorithm. This is fast and works well for masks where you don't want to interpolate two categorical classes, but results in resampling artifacts for floating point images. This PR adds an attribute to control the resampling algorithm used. It defaults to "bilinear" for float (usually images) and "nearest" for int (usually masks).

I chose to base this on dtype instead of is_image because there are times when you want to interpolate floating point pixelwise regression masks.

@robmarkcole can you upload before and after plots from the example you showed me?

Closes #2012

adamjstewart avatar Apr 20 '24 09:04 adamjstewart

Doesn't seem to have as big of an impact on I/O performance as I expected:

raw (random) raw (grid) preprocessed (random) preprocessed (grid)
before 17.350 10.984 9.9158 4.6496
after 18.706 12.371 9.5933 4.5972

adamjstewart avatar Apr 20 '24 09:04 adamjstewart

This approach doesn't work well for datasets like L7 Irish or L8 Biome where a single RasterDataset is used for both image and mask. I think the solution is to instead subclass IntersectionDataset like I did in #1972.

adamjstewart avatar Apr 20 '24 09:04 adamjstewart

I think cubic interpolation is not such a good idea because the corresponding convolution kernel has negative weights, meaning that the output data range is not guaranteed.

This can really mess up any normalization or scaling applied to the data before the resampling step.

Here’s a minimal example of what I’m talking about:

import numpy as np
import rasterio

dst_data = data = np.random.rand(512, 512)

dst: rasterio.io.DatasetWriter
with rasterio.open(
    "temp.tif", mode="w", count=1, dtype=np.float32, width=512, height=512
) as dst:
    dst.write(dst_data, indexes=1)

src: rasterio.io.DatasetReader
with rasterio.open("temp.tif") as src:
    src_data = src.read(
        out_shape=(src.count, int(src.height * 2), int(src.width * 2)),
        resampling=rasterio.enums.Resampling.cubic,
    )
    assert (
        src_data.min() >= 0 and src_data.max() < 1
    ), "The cubic kernel has negative weights!"

This can also cause issues with no-data values; there may be huge inaccuracies in the final product depending on where they land in relation to the kernel.

I think bilinear interpolation is fine for 99.9% of cases.

DimitrisMantas avatar Apr 22 '24 08:04 DimitrisMantas

This pair shows the original issue, addressed in this PR

image

This was created with a random sampler, checking for it, but in general the results are improved. There are however some kind of streak artefact:

image

robmarkcole avatar Apr 22 '24 09:04 robmarkcole

You can play around with other enums and see which looks best: https://rasterio.readthedocs.io/en/stable/api/rasterio.enums.html#rasterio.enums.Resampling

I agree with @DimitrisMantas that we should pick a safe/simple/fast default.

adamjstewart avatar Apr 22 '24 09:04 adamjstewart

I think this is more or less the one-liner for each method:

  • Nearest/Mode: Good for masks; not suitable for continuous fields (i.e., images)
  • Bilinear: Suitable for continuous data; works best with smooth-ish fields (e.g., DEMs & DSMs)
  • Cubic/Cubic Spline/Lanczos: These are all the same as far as we are concerned; they are a bit unpredictable.
  • Average: It sounds sounds a bit like bilinear, but it's probably a kernel.
  • Gauss: Produces a probably undesirable smoothing effect when upsampling images.
  • Min/Max/etc.: They are probably useful in certain regression tasks, but I can't think of a potential application at the moment.

DimitrisMantas avatar Apr 22 '24 11:04 DimitrisMantas

By the way, Lanczos is technically "the best" of the bunch...

DimitrisMantas avatar Apr 22 '24 11:04 DimitrisMantas

Agree with ^. It should be NN resampling for masks and Bilinear for all else by default. If a user wants something specific they can override.

isaaccorley avatar Apr 22 '24 14:04 isaaccorley