rasterio icon indicating copy to clipboard operation
rasterio copied to clipboard

Use Decimal objects to calculate exact bounds

Open mwtoews opened this issue 1 year ago • 8 comments

With a raster with 0.1 resolution that has shape and transform:

(171, 161)
Affine(0.1, 0.0, 163.95,
       0.0, -0.1, -31.95)

the bounds are not exactly calculated:

BoundingBox(left=163.95, bottom=-49.05, right=180.04999999999998, top=-31.95)

This is due to the same reason why 0.3 - 0.1 * 3 != 0, because of floating point behaviour.

My suggested fix is to use Decimal objects, which yield expected bounds:

BoundingBox(left=163.95, bottom=-49.05, right=180.05, top=-31.95)

Is this change welcome? If so, then I'll refine this PR further with tests.

mwtoews avatar Aug 29 '24 00:08 mwtoews

@mwtoews I'm not sure about this... GDAL isn't based on decimal math. And I wonder why the proposed change didn't break any tests? If the existing expectations are wrong, and use of decimal is rights, I would expect some failures.

sgillies avatar Aug 29 '24 01:08 sgillies

My guess is that the tests pass because it has plenty of assert np.allclose matches, that check if the bounds are close-enough. If I were to refine this PR further, then exact bounds matches would be checked.

I don't think that GDAL can calculate the bounds (or I've never seen it, at least).

mwtoews avatar Aug 29 '24 01:08 mwtoews

@mwtoews this is too late for 1.4.0, but I don't want to rule it out for a future release. We've got workarounds for numerical imprecision everywhere, I'm unsure if this kind of change makes calculations more stable or less stable. For example, say I compute the bounds using Decimal and then pass the bounds back into to get the height and width of a raster field. I won't be able to try it out seriously after the 1.4.0 release, but am willing to discuss and review.

sgillies avatar Aug 29 '24 22:08 sgillies

This possible feature does not need to be rushed, so can wait. Discussion of pros/cons is more important.

Let's just look at one dimension along x-axis, with "left" and "right" as the bounds, "ncell" as the integer number of cells, and "res" as cell resolution. The float and Decimal types are distinguished with their "_f" and "_d" suffixes.

from decimal import Decimal

# inputs
ncell = 161
res_f = 0.1
res_d = Decimal(str(res_f))
left_f = 163.95
left_d = Decimal(str(left_f))

# image width and right bound calcs
width_d = res_d * ncell  # Decimal('16.1')
width_f = res_f * ncell  # 16.1
right_d = width_d + left_d  # Decimal('180.05')
right_f = width_f + left_f  # 180.04999999999998

# re-calculate resolution from bounds
(right_d - left_d) / ncell  # Decimal('0.10')
(float(right_d) - left_f) / ncell  # 0.10000000000000014
(right_f - left_f) / ncell  # 0.09999999999999996

# re-calculate number of cells from bounds
(right_d - left_d) / res_d  # Decimal('161.0')
(float(right_d) - left_f) / res_f  # 161.00000000000023
(right_f - left_f) / res_f  # 160.99999999999994
int((right_f - left_f) / res_f)  # 160
round((right_f - left_f) / res_f)  # 161

as expected, everything is exact with Decimal types, with no surprises. Meanwhile using native float types there are awkward values that need rounding to get expected results.


If needed, a real-world example of this raster can be fetched here, which would have further improvements with a comparison of these outputs:

$ rio info --bounds 2m_temperature.grb.tif
< 163.95 -49.05 180.04999999999998 -31.95
> 163.95 -49.05 180.05 -31.95
$ rio bounds 2m_temperature.grb.tif
< {"bbox": [163.95, -49.05, 180.04999999999998, -31.95], "geometry": {"coordinates": [[[163.95, -49.05], [180.04999999999998, -49.05], [180.04999999999998, -31.95], [163.95, -31.95], [163.95, -49.05]]], "type": "Polygon"}, "properties": {"filename": "2m_temperature.grb.tif", "id": "0", "title": "2m_temperature.grb.tif"}, "type": "Feature"}
> {"bbox": [163.95, -49.05, 180.05, -31.95], "geometry": {"coordinates": [[[163.95, -49.05], [180.05, -49.05], [180.05, -31.95], [163.95, -31.95], [163.95, -49.05]]], "type": "Polygon"}, "properties": {"filename": "2m_temperature.grb.tif", "id": "0", "title": "2m_temperature.grb.tif"}, "type": "Feature"}

mwtoews avatar Aug 30 '24 00:08 mwtoews

@sgillies how is this sitting with you? Are there any real-world thought experiments where this approach is useful or not useful?

mwtoews avatar Dec 13 '24 01:12 mwtoews

@mwtoews GDAL doesn't use decimal arithmetic, so any little changes we make in rasterio are lost when we use any GDAL methods. It's a little like making a promise that we can't keep.

sgillies avatar Dec 13 '24 16:12 sgillies

GDAL does not have any API to describe raster bounds or dimension lengths, so it's mostly "in the clear".

One place that I can see some floating point noise is with gdalinfo (version 3.10.0), which shows:

...
Origin = (163.949999999999989,-31.949999999999999)
Pixel Size = (0.100000000000000,-0.100000000000000)
Corner Coordinates:
Upper Left  ( 163.9500000, -31.9500000) (163d57' 0.00"E, 31d57' 0.00"S)
Lower Left  ( 163.9500000, -49.0500000) (163d57' 0.00"E, 49d 3' 0.00"S)
Upper Right ( 180.0500000, -31.9500000) (180d 3' 0.00"E, 31d57' 0.00"S)
Lower Right ( 180.0500000, -49.0500000) (180d 3' 0.00"E, 49d 3' 0.00"S)
Center      ( 172.0000000, -40.5000000) (172d 0' 0.00"E, 40d30' 0.00"S)
...

The origin uses excessive precision for the formatting for the first and fourth geotransform terms. This can be inspected an re-created using the SWIG API:

>>> from osgeo import gdal
>>> gdal.UseExceptions()
>>> ds = gdal.Open('2m_temperature.grb.tif')
>>> ds.GetGeoTransform()
(163.95, 0.1, 0.0, -31.95, 0.0, -0.1)
>>> [f"{v:.15f}" for v in ds.GetGeoTransform()]
['163.949999999999989', '0.100000000000000', '0.000000000000000', '-31.949999999999999', '0.000000000000000', '-0.100000000000000']

and the the corner coordinates from gdalinfo don't show any precision errors since they use a fixed precision with 7 decimal places. With this respect, this PR is an improvement since the corner coordinates are represented the same as gdalinfo.

mwtoews avatar Dec 15 '24 22:12 mwtoews

This might be addressed by numerical accuracy fixes in #3315 for poorly conditioned transforms.

groutr avatar Nov 12 '25 15:11 groutr