pyresample icon indicating copy to clipboard operation
pyresample copied to clipboard

Naive "rounding to even" affect correctness of `get_array_indices_from_lonlat` and `get_area_slices` in some occasions

Open ghiggi opened this issue 3 years ago • 14 comments

Problem description

This issue addresses the current inaccurate computation of integer indices from (float) array indices. This can cause errors in many methods related to indices retrieval (see below) and possible inaccurate results of get_area_slices (see https://github.com/pytroll/pyresample/issues/371)

The fact is that the method round used to get the integer indices from float indices "rounds to the even".

Rounds to even example

round(0.5) # --> 0 <-- lower (even)
round(1.5) # --> 2 <-- upper (odd)
round(2.5) # --> 2 <-- lower (even) 
round(3.5) # --> 4 <-- upper (odd)

To visualize the problem, let's create a classical WGS84 grid at 0.5 resolution of shape (y=360,x=720). The valid number of indices on x goes from 0 to 359. The corners are [(-180.0, 90.0), (180.0, 90.0), (180.0, -90.0), (-180.0, -90.0)]

from pyresample.geometry import AreaDefinition
area_def = AreaDefinition.from_epsg(code=4326, resolution=0.5) 
area_def.height
area_def.width
area_def.shape
area_def.outer_boundary_corners  

Now let's try to retrieve the integer indices of the top and bottom left corners

# Integer indices 
area_def.get_array_indices_from_lonlat(-180, 90)  # (0, 0) --> OK         
area_def.get_array_indices_from_lonlat(-180, -90) # ValueError: Point outside area:( 0.000000 360.000000) ---> BUG 

# Float indices 
area_def.get_array_coordinates_from_lonlat(-180, 90)   # (-0.5, -0.5)
area_def.get_array_coordinates_from_lonlat(-180, -90)  # (-0.5, 359.5)

Employing np.round, lead to rounding of 359.5 (even) upper to 360 ... outside the valid integer indices of the AreaDef (0-359)

np.round(area_def.get_array_coordinates_from_lonlat(-180, -90)) # array([ -0., 360.])

Solution

Replace all round() occurence in geometry.py by the below round_to_indices function

def round_to_indices(x): 
    # First include the start border 
    x[x == -0.5] = 0
    # Then round to the lower (instead of even) 
    decimal_values = x - np.floor(x)
    idx_midpoint = np.where(decimal_values == 0.5)[0]
    if len(idx_midpoint) > 0:
        x[idx_midpoint] = x[idx_midpoint] - 0.0000001
    return np.round(x)  

# Rounding now is performed "correctly" 
round_to_indices(area_def.get_array_coordinates_from_lonlat(-180, -90)) # array([  0., 359.])

# Other examples
arr = np.array([0.5, 1.5, 2.5, 3.5, 4.5])
np.round(arr)         # array([0., 2., 2., 4., 4.])  <-- rounding to even 
round_to_indices(arr) # array([0., 1., 2., 3., 4.])

arr = np.array([-0.51, -0.5, 0, 0.5, 1, 358.5, 359, 359.5, 359.1])
np.round(arr) 
round_to_indices(arr) # change behaviour only for X.5 numbers 

Further notes

Addressing the rounding issue, will automatically solve the same problem arising currently in the deprecated methods lonlat2colrow and get_xy_from_lonlat.

area_def.lonlat2colrow(lons=-180, lats=-90) # ValueError: Point outside area:( 0.000000 360.000000) ---> BUG 
area_def.get_xy_from_lonlat(-180, -90)      # ValueError: Point outside area:( 0.000000 360.000000) ---> BUG 

ghiggi avatar Jan 04 '22 12:01 ghiggi

For completeness, could you update you post to include the results of get_array_indices_from_lonlat and the deprecated methods? You talk about the rounding of them later but I don't see what the actual results are.

djhoese avatar Jan 04 '22 12:01 djhoese

Done

ghiggi avatar Jan 04 '22 12:01 ghiggi

I'm a little torn on this. I agree that this rounding to even is a problem, but it also seems like you are also identifying another issue which is that if you request the index for the extent of an Area that on the "left" side it returns index (0, 0) but on the "right" side it returns an index outside of the original array. This may not be desired in your case, but it is consistent at least. We have to be careful as these indices may be used between AreaDefinitions so we don't want to round towards the valid indices just to get a completely contained answer. I would say asking for the index for points lying on the extents of an area should produce indexes outside of the Area...or at least it isn't wrong if they are.

The implementation you've suggested would not work well with dask. I'm wondering if there is another implementation that may work. Something like https://realpython.com/python-rounding/#rounding-half-up perhaps (so np.floor(x + 0.5)).

djhoese avatar Jan 04 '22 13:01 djhoese

I would not say that is consistent. I should have provided also another example. Try out this:

area_id = 'wgs84'
description = 'World Geodetic System 1984'
proj_id = 'wgs84'
projection = 'EPSG:4326'   
width = 720
height = 357 # HERE AN ODD NUMBER 
area_extent = (-180,-90,180,90)
shape = (height, width) # y, x
area_def = AreaDefinition(area_id=area_id, description=description, proj_id=None,  # custom 
                         projection=projection, 
                         width=width, height=height, 
                         area_extent=area_extent)
area_def.get_array_indices_from_lonlat(-180, -90) # (0, 356) WORKS !!!!

Currently, if the shape is even, it throws an error (first example). If the shape is odd (here 357 instead of 360) it returns the index also on the right side!

ghiggi avatar Jan 04 '22 14:01 ghiggi

Which part of what I said is not consistent? The np.floor(x + 0.5) part?

I think as far as consistency we're talking about inclusive versus exclusive indexing...sort of. Your examples point out the main consistency issue that regardless of area definition size we want the indices to be the same. The main test cases are if you request the index that lies on the left-bound of the area def and a point that lies on the right-bound of the area def. Right now this depends on the size of the area.

I think it would make sense and would work for the most cases if pyresample said that points lying on the left-most bound should result in the 0 integer index and on the right-most bound should result in num_cols (1 past the last valid index). So indexes would be inclusive on the left and exclusive on the right. Kind of like slicing and range in python.

I would prefer if there was no special handling based on an index being on the edge of the area definition which means no special handling for negative or positive values. That is just my instinct though. We want a uniform and consistent rounding which rounding to even is not.

djhoese avatar Jan 04 '22 15:01 djhoese

I think we have to be really careful here, as the performance of resampling might drop drastically if the new round_to_indices function is not fast enough.

mraspaud avatar Jan 04 '22 16:01 mraspaud

Also, I am not convinced a point at the edge of the area definition should be either inside or outside of the area definition, I think the behaviour should be undetermined, and documented that way, because I don't believe we can efficiently go around floating point arithmetic here.

mraspaud avatar Jan 04 '22 16:01 mraspaud

The outer edges of the entire area definition are just outer edges of particular "cells" of the area definition. What applies to the individual cells should apply to the entire area definition. The integer index for a location exactly between two "cells" should be the index of one of those two cells. That or we start treating area definition cells as not squares/rectangles but as circles/ellipses and do a lot more complicated math. Anyway...if the index for a location between two cells is the first cell ("left" in the X dimension) then what I said before stands. If it is the second cell then that is equivalent to np.ceil(x + 0.5) and I would consider non-standard and odd.

djhoese avatar Jan 04 '22 16:01 djhoese

Ok, so if I understand what you are saying, you basically want to change the behaviour of round to be consistent. Here are some thoughts:

  • that will be surprising to people who expect the function doing just a rounding
  • it needs to be fast, array compatible, dask compatible
  • I'd rather document the behaviour and let the user take care of that (keep it simple and stupid)

As mentioned earlier, IMO a point on the edge of two cells is a special case and we shouldn't try to be clever about it. We do have a function that provides the floating point values instead that we can use if we want certain behaviours, for example get_area_slices should probably using ceil and floor anyway instead of round.

mraspaud avatar Jan 05 '22 07:01 mraspaud

I just wanted to report the current inconsistent behavior when retrieving indices of points lying on the right/bottom sides of the AreaDefinition extent, that cause errors if the total number of valid integer indices is even.

  • If the total number of valid integer indices is even, using np.round returns an integer index outside of the original array.
  • If the total number of valid integer indices is odd, using np.round returns an integer index inside the original array.

Using np.floor(x + 0.5) would fix the inconsistent behavior at the edge of two cells, being inclusive on the left and exclusive on the right.

That said, you have a much better understanding and overview of the downstream impact of an eventual modification, so maybe it might be just worth documenting the behavior :)

ghiggi avatar Jan 05 '22 11:01 ghiggi

If just that simple modification allows for the behaviour, I'm ok with it as it will not impact performance much. I suppose that since we are using only positive indices in this function that np.floor(x + 0.5) will indeed work.

mraspaud avatar Jan 05 '22 11:01 mraspaud

@ghiggi @mraspaud I think we're at the point where we all need to see the code that is going to change and think of all the cases that will be effected. @ghiggi Do you have time to make a PR for the cases you know are effected?

djhoese avatar Jan 05 '22 15:01 djhoese

Sorry, not this week for sure :S I have a few unexpected things that landed on my head yesterday, and in the remaining time, I would like to finish out stuff for the draft PR on swath cropping and the spherical.py refactoring.

ghiggi avatar Jan 06 '22 09:01 ghiggi

no rush

mraspaud avatar Jan 07 '22 10:01 mraspaud