xdem icon indicating copy to clipboard operation
xdem copied to clipboard

`xdem.coreg.ICP.fit_pts` fails for unknown reason with a regular dataset

Open erikmannerfelt opened this issue 10 months ago • 4 comments

As promised in #422 , here's the other issue hehe.

This is a strange one. I already have a fix, but I don't understand why it helps!

Problem description

When running ICP.fit_pts() I ran into the issue that led to #422; my points were all filtered away for some reason. After some debugging, I found where this happens, but I still don't know why.

First, rasters containing information on the normals are sampled using regular Raster.interp_points calls: https://github.com/GlacioHack/xdem/blob/3956073ef6c1846ffc6de3ac5368269f45131f42/xdem/coreg/affine.py#L525-L529

Then, when the rasters have been sampled, a convoluted nan removal line removes the unsuccessful samples: https://github.com/GlacioHack/xdem/blob/3956073ef6c1846ffc6de3ac5368269f45131f42/xdem/coreg/affine.py#L536-L537

This led to my ref_dem being empty, as the above interp_points for some reason only returned nans! I figured out a fix for this problem, but I still don't understand the symptom...

The solution

Instead of the sampling in the code (shown above), I changed it to:

normal_east.tags["AREA_OR_POINT"] = "Area"
rowcol = np.round(np.dstack(normal_east.xy2ij(ref_dem["E"], ref_dem["N"], shift_area_or_point=True)).squeeze()).astype(int)

if any(col not in ref_dem for col in ["nx", "ny", "nz"]):
    for key, raster in [("nx", normal_east), ("ny", normal_north), ("nz", normal_up)]:
        ref_dem[key] = raster.data[rowcol[:, 0], rowcol[:, 1]].filled(np.nan)

Basically, I make a more efficient nearest neighbour sampling by first determining the row and column indices, then directly indexing the underlying raster. This should be much more efficient, as that part is only done once instead of three times, and for some reason it fixes my problem with the sampling.

What's the actual problem?

Without a minimal (non-)working example, it's hard for me to figure out exactly what's wrong... Honestly, I think it boils down to one of two potential issues:

  1. There's something wrong with Raster.interp_points in geoutils
  2. There's something wrong with my use of the functionality

Honestly, and argument for making a PR to "fix" this can simply be motivated by performance. If Raster.interp_points is not used in the method, no Rasters have to be instantiated and I can just use the rasterio coords to ij functions. That'd reduce the amount of copying that might happen and it will probably make the code look cleaner. I might open a PR to improve the code and then add an associated geoutils issue for tackling the underlying problem.

Still, it would be great to get to the bottom of why this actually fails... @rhugonnet or @adehecq, have you experienced something similar before with this function?

erikmannerfelt avatar Aug 31 '23 14:08 erikmannerfelt

Also, @rhugonnet and @adehecq, if you have any recommendations of how to make a minimal (non-)working example, I'm all ears. It's a float32 DEM with ICESat-2 points that should work just like in the test. But it doesn't!

erikmannerfelt avatar Aug 31 '23 14:08 erikmannerfelt

Wait what the heck is the difference between Raster.value_at_coords and Raster.interp_points?? I see that they have almost the same docstring and I've invariably used both interchangeably.

erikmannerfelt avatar Aug 31 '23 14:08 erikmannerfelt

I have a deadline today, will look in more details tomorrow.

Quick answer on your last question:

  • The value_at_coords function extracts the closest point value to the coords or its average (any type of reductor function) within a square window of a certain size (doesn't require to load the entire Raster).
  • The interp_points function performs a regular grid interpolation (nearest, linear, cubic) to extract the values at the coords.

More details in the examples: https://geoutils.readthedocs.io/en/stable/analysis_examples/index.html#point-extraction But, yes, we definitely need to change the names and improve the docstrings of those functions...

rhugonnet avatar Aug 31 '23 18:08 rhugonnet

I have a deadline today, will look in more details tomorrow.

Quick answer on your last question:

* The `value_at_coords` function extracts the closest point value to the coords or its average (any type of reductor function) within a square window of a certain size (doesn't require to load the entire Raster).

* The `interp_points` function performs a regular grid interpolation (nearest, linear, cubic) to extract the values at the coords.

More details in the examples: https://geoutils.readthedocs.io/en/stable/analysis_examples/index.html#point-extraction But, yes, we definitely need to change the names and improve the docstrings of those functions...

Thanks for the info! That makes more sense now. But perhaps they could just be one function with an "interpolate" kwarg to value_at_coords? Either way, a docstring improvement would be great! I'll add an issue in geoutils.

erikmannerfelt avatar Sep 01 '23 11:09 erikmannerfelt