rasterio
rasterio copied to clipboard
Window (boundless) with negative offset may result in resampling
Expected behavior and actual behavior.
When reading from a dataset in boundless mode, supplying a window with exact integer coordinates, I expect no resampling to be done. We encountered a situation where there is resampling, which might have to do with internal conversions to coordinates. I tried to narrow it down by reducing array dimensions, but the issue went away.
Steps to reproduce the problem.
from rasterio.io import MemoryFile
from rasterio.transform import Affine
from rasterio.windows import Window
import numpy as np
from numpy.testing import assert_almost_equal
HEIGHT = 9508 # only happens for this specific height
data_array = np.arange(HEIGHT, dtype="f4").reshape((HEIGHT, 1))
with MemoryFile() as memfile:
with memfile.open(
driver='GTiff',
count=1,
height=HEIGHT,
width=1,
dtype=data_array.dtype,
transform=Affine(1.0, 0.0, 0, 0.0, 1.0, 0),
) as dataset:
dataset.write(data_array[np.newaxis, ...])
with memfile.open(driver='GTiff') as dataset:
# read first column, rows 0-388
a = dataset.read(
1,
window=Window(col_off=0, row_off=0, width=1, height=388),
boundless=True,
fill_value=-9999,
)[:, 0]
# the result is an array that increases from 0-388
assert_almost_equal(a, np.arange(388))
b = dataset.read(
1,
window=Window(col_off=0, row_off=-12, width=1, height=400),
boundless=True,
fill_value=-9999,
)[:, 0]
# the expected result is 12 * -9999 and then the same as above
try:
assert_almost_equal(b, np.concatenate([[-9999] * 12, a]))
except AssertionError:
assert b.shape == (400, )
print("The shape of b is correct: ", b.shape)
assert np.all(b[:12] == -9999)
print("The first 12 values in b are nodata")
assert b[12] == 0
print("The first data value (index 12) is 0")
assert b[-1] != 387
print("The last data entry of b is incorrect (expected 387): ", b[-1])
index = np.where(np.diff(b[12:]) != 1)[0][0] + 12
print("b skips a pixel at index ", index)
print("Values around there: ", b[index-2: index+3])
Output:
The shape of b is correct: (400,)
The first 12 values in b are nodata
The first data value (index 12) is 0
The last data entry of b is incorrect (expected 387): 388.0
b skips a pixel at index 205
Values around there: [191. 192. 193. 195. 196.]
Operating system
Ubuntu 20.04
Rasterio version and provenance
Version 1.2.10 manylinux wheel installed from PyPI
I was able to deduce a somewhat minimal example of my issue, which I think is the same as the one reported here. rasterio 1.2.10 py39h9c66686_4 conda-forge - on Mac M1.
- Create a dummy file (in my real scenario a DEM) and write a subwindow containing the column indices.
- We read the same window, but shift col_off = -1
- We expect that the differences along axis 1 / cols is 1 or 0 (for out of bounds)
import rasterio
import numpy as np
from rasterio.crs import CRS
from rasterio.transform import Affine
from rasterio.windows import Window
DUMMY_TIF = '/Users/me/Desktop/dummy.tif'
SUBWINDOW = Window(col_off=0, row_off=7500, width=500, height=500)
profile = {'driver': 'GTiff',
'dtype': 'int16',
'width': 15954,
'height': 13644,
'count': 1,
'crs': CRS.from_epsg(3857),
'transform': Affine(20.0, 0.0, 931839.4892686082, 0.0, -20.0, 6035503.43370524),
}
# First, create a dummy file. We only write a small 500x500 window that contains the col indices [0,500]
with rasterio.open(DUMMY_TIF, 'w', **profile) as trg:
trg.write((np.indices((SUBWINDOW.height, SUBWINDOW.width))[1]), window=SUBWINDOW, indexes=1)
# Second, we read the same window boundless, with an col_off -1 [so col_off = -1].
# The expected value differences along axis 1 [cols] are always <= 1
with rasterio.open(DUMMY_TIF, 'r') as src:
data = src.read(window=Window(SUBWINDOW.col_off-1, SUBWINDOW.row_off, SUBWINDOW.width, SUBWINDOW.height),
boundless=True, indexes=1)
assert np.max(np.diff(data, 1)) == 1
Traceback (most recent call last):
AssertionError
My workaround:
def boundless_window_read(window, src, **kwargs):
col_shift = min(0, window.col_off) * -1
row_shift = min(0, window.row_off) * -1
full_view = np.zeros((window.height, window.width))
view = src.read(window=crop(window, src.height, src.width), **kwargs)
full_view[row_shift:row_shift + view.shape[0], col_shift:col_shift + view.shape[1]] = view
return full_view
@caspervdw this feels a bit like the issue reported in https://github.com/OSGeo/gdal/issues/4098. But since that's closed and since I can reproduce what you see with the latest GDAL, there must be either a lingering GDAL bug or a bug in rasterio.
Boundless reads use a VRT. That VRT's source and destination sizes, which may cause resampling, are set at https://github.com/rasterio/rasterio/blob/master/rasterio/vrt.py#L188-L197. I've added your code as a test in 44c36f39. Hypothesis found that 757 is another troublesome height.
The test at https://github.com/rasterio/rasterio/commit/44c36f39a42f63353bb2ace26af0ef321af6509e#diff-53d1099de675cd274b6e519642e75ac96a257e1e83d456c6fb6d9ce89ba65147R86 is, I believe, the more general case. It seems to implicate one corner of the boundless window being within the dataset and one corner being outside as the cause. Our test where both corners are outside the dataset always pass. In my local runs I get something like 19 examples that succeed for the 1 falsifying example of the "outer_upper_left" test.
Will give this a closer look.
I can't reproduce this with Rasterio 1.3.5. I suspect it was fixed here: https://github.com/rasterio/rasterio/blob/maint-1.3/CHANGES.txt#L253.
Thanks @sgillies !