Use Decimal objects to calculate exact bounds
With a raster with 0.1 resolution that has shape and transform:
(171, 161)
Affine(0.1, 0.0, 163.95,
0.0, -0.1, -31.95)
the bounds are not exactly calculated:
BoundingBox(left=163.95, bottom=-49.05, right=180.04999999999998, top=-31.95)
This is due to the same reason why 0.3 - 0.1 * 3 != 0, because of floating point behaviour.
My suggested fix is to use Decimal objects, which yield expected bounds:
BoundingBox(left=163.95, bottom=-49.05, right=180.05, top=-31.95)
Is this change welcome? If so, then I'll refine this PR further with tests.
@mwtoews I'm not sure about this... GDAL isn't based on decimal math. And I wonder why the proposed change didn't break any tests? If the existing expectations are wrong, and use of decimal is rights, I would expect some failures.
My guess is that the tests pass because it has plenty of assert np.allclose matches, that check if the bounds are close-enough. If I were to refine this PR further, then exact bounds matches would be checked.
I don't think that GDAL can calculate the bounds (or I've never seen it, at least).
@mwtoews this is too late for 1.4.0, but I don't want to rule it out for a future release. We've got workarounds for numerical imprecision everywhere, I'm unsure if this kind of change makes calculations more stable or less stable. For example, say I compute the bounds using Decimal and then pass the bounds back into to get the height and width of a raster field. I won't be able to try it out seriously after the 1.4.0 release, but am willing to discuss and review.
This possible feature does not need to be rushed, so can wait. Discussion of pros/cons is more important.
Let's just look at one dimension along x-axis, with "left" and "right" as the bounds, "ncell" as the integer number of cells, and "res" as cell resolution. The float and Decimal types are distinguished with their "_f" and "_d" suffixes.
from decimal import Decimal
# inputs
ncell = 161
res_f = 0.1
res_d = Decimal(str(res_f))
left_f = 163.95
left_d = Decimal(str(left_f))
# image width and right bound calcs
width_d = res_d * ncell # Decimal('16.1')
width_f = res_f * ncell # 16.1
right_d = width_d + left_d # Decimal('180.05')
right_f = width_f + left_f # 180.04999999999998
# re-calculate resolution from bounds
(right_d - left_d) / ncell # Decimal('0.10')
(float(right_d) - left_f) / ncell # 0.10000000000000014
(right_f - left_f) / ncell # 0.09999999999999996
# re-calculate number of cells from bounds
(right_d - left_d) / res_d # Decimal('161.0')
(float(right_d) - left_f) / res_f # 161.00000000000023
(right_f - left_f) / res_f # 160.99999999999994
int((right_f - left_f) / res_f) # 160
round((right_f - left_f) / res_f) # 161
as expected, everything is exact with Decimal types, with no surprises. Meanwhile using native float types there are awkward values that need rounding to get expected results.
If needed, a real-world example of this raster can be fetched here, which would have further improvements with a comparison of these outputs:
$ rio info --bounds 2m_temperature.grb.tif
< 163.95 -49.05 180.04999999999998 -31.95
> 163.95 -49.05 180.05 -31.95
$ rio bounds 2m_temperature.grb.tif
< {"bbox": [163.95, -49.05, 180.04999999999998, -31.95], "geometry": {"coordinates": [[[163.95, -49.05], [180.04999999999998, -49.05], [180.04999999999998, -31.95], [163.95, -31.95], [163.95, -49.05]]], "type": "Polygon"}, "properties": {"filename": "2m_temperature.grb.tif", "id": "0", "title": "2m_temperature.grb.tif"}, "type": "Feature"}
> {"bbox": [163.95, -49.05, 180.05, -31.95], "geometry": {"coordinates": [[[163.95, -49.05], [180.05, -49.05], [180.05, -31.95], [163.95, -31.95], [163.95, -49.05]]], "type": "Polygon"}, "properties": {"filename": "2m_temperature.grb.tif", "id": "0", "title": "2m_temperature.grb.tif"}, "type": "Feature"}
@sgillies how is this sitting with you? Are there any real-world thought experiments where this approach is useful or not useful?
@mwtoews GDAL doesn't use decimal arithmetic, so any little changes we make in rasterio are lost when we use any GDAL methods. It's a little like making a promise that we can't keep.
GDAL does not have any API to describe raster bounds or dimension lengths, so it's mostly "in the clear".
One place that I can see some floating point noise is with gdalinfo (version 3.10.0), which shows:
...
Origin = (163.949999999999989,-31.949999999999999)
Pixel Size = (0.100000000000000,-0.100000000000000)
Corner Coordinates:
Upper Left ( 163.9500000, -31.9500000) (163d57' 0.00"E, 31d57' 0.00"S)
Lower Left ( 163.9500000, -49.0500000) (163d57' 0.00"E, 49d 3' 0.00"S)
Upper Right ( 180.0500000, -31.9500000) (180d 3' 0.00"E, 31d57' 0.00"S)
Lower Right ( 180.0500000, -49.0500000) (180d 3' 0.00"E, 49d 3' 0.00"S)
Center ( 172.0000000, -40.5000000) (172d 0' 0.00"E, 40d30' 0.00"S)
...
The origin uses excessive precision for the formatting for the first and fourth geotransform terms. This can be inspected an re-created using the SWIG API:
>>> from osgeo import gdal
>>> gdal.UseExceptions()
>>> ds = gdal.Open('2m_temperature.grb.tif')
>>> ds.GetGeoTransform()
(163.95, 0.1, 0.0, -31.95, 0.0, -0.1)
>>> [f"{v:.15f}" for v in ds.GetGeoTransform()]
['163.949999999999989', '0.100000000000000', '0.000000000000000', '-31.949999999999999', '0.000000000000000', '-0.100000000000000']
and the the corner coordinates from gdalinfo don't show any precision errors since they use a fixed precision with 7 decimal places. With this respect, this PR is an improvement since the corner coordinates are represented the same as gdalinfo.
This might be addressed by numerical accuracy fixes in #3315 for poorly conditioned transforms.