Merge dem_coregistration and coregistration_3d
Context
Currently in xDEM, there are two ways to perform a coregistration:
-
dem_coregistration(dem_a, dem_b) -
dem_a.coregister_3d(dem_b)
Development Plan
- [ ] Make the
dem_coregistrationfunction private - [ ] Call the (now private)
dem_coregistrationfunction fromcoregister_3d, and bring it up to date - [ ] Remove references to
dem_coregistrationin the documentation - [ ] Update tests to reflect the changes
- [ ] Relax input validation to allow coregistration of identical DEMs
→ Show a warning rather than throwing an error
@rhugonnet @adehecq Does this approach sound good to you?
Yes, fully agree!
Looking at it again, most of dem_coregistration's code seem better suited to a CLI-like interface.
Some thoughts for adapting the pieces of code:
- I think "I/O" (reading/writing to file) should be completely left out of
coregister_3d(), given that these are one-liners and it would become ambiguous how objects are opened (Xarray accessor or DEM class, chunk sizes, etc) once we add the Xarray accessor, - Similarly, the "create_inlier_mask" function can also be left out completely I think, given that everything can be done in just a couple lines with
Vector.create_mask(), Raster boolean comparison such as>and logical operations (see example below). - The "filtering" part of
dem_coregistrationwas supposed to be moved in https://github.com/GlacioHack/xdem/blob/main/xdem/coreg/filters.py. The initial idea was to create a classCoregFilters(Coreg)whosefit_and_apply()step filters the input data. This way we could naturally have a defaultcoreg_methodincoregister_3d()that includes filtering to be robust for most steps. For example:coreg_method = xdem.coreg.NMADFilter() + xdem.coreg.NuthKaab() + xdem.coreg.Deramp(). But with the addition ofRaster.filteryou are working on, maybe we could skip this entirely? (see example below again) - Finally, I think returning statistics and generating plots should be integrated directly as an option in
Coreg.fit_and_apply(), by renovating old subfunctions (like the oldCoreg.residuals()), etc...
So in short, we shouldn't include I/O, masking or filtering steps in DEM.coregister_3d() that can be done separately in a couple lines around!
It would look like this for the user, it's not very long and allows much more flexibility (almost as long as the dem_coregistration call itself would be with all the arguments, except for a couple of dem1 objects re-used):
# I/O left to the user (choose Xarray or DEM object, chunksizes, etc)
dem1 = DEM(fn1)
dem2 = DEM(fn2)
# Masking also (use your own data or public data, and other masks)
geometric_mask = Vector(fn_vect).create_mask(dem1)
slope = dem1.slope()
slope_mask = (slope > 1) & (slope < 40)
# Leave filtering to user as well?
dh = dem1 - dem2.reproject(dem1)
nmad_filter = dh.filter("nmad")
exclude_3nmad_mask = np.abs(dh - np.nanmedian(dh)) > 3*nmad_filter
# Final inlier mask
inlier_mask = slope_mask & geometric_mask & exclude_3nmad_mask
# Then coregister (returns statistics... plots?)
dem2_coreg, coreg_stats = dem1.coregister_3d(dem2, inlier_mask, return_stats=True)
We could maintain some useful defaults, but I don't think we have the require knowledge for it yet (maybe as one of the results of my ongoing study). For masking, I'm not convinced we should have any default, there is nothing that shows that masking low/high slopes is actually beneficial (I'm testing this right now actually, I can share the results once I have them). For filtering, it implies that the two rasters are differentiable into a dh (not the case if the initial misalignment is very large, like for cases covered by ICP/CPD). Also, it only works as intended if there is only a small shift and no rotation. so it is not a very generic filter...
Sorry for having missed this issue and not discussing it earlier.
Overall, I agree that it is not ideal to have both, relatively similar, methods and it makes sense to merge them. The context of the existence of dem_coregistration is explained in the original PR and in general, the idea was to have a one-line method similar to what exists/existed in other repos.
I believe this functionality is critically needed, but could be covered by a CLI instead of a specific function. In particular, taking as argument paths to the data is not what we should encourage with the Python API.
Here are some remarks and things to consider:
- the function was optionally taking a string so that the data would be loaded internally (with
load_multiple_rasters). This was done to ensure that memory requirements were kept to the minimum, by not loading the data when not needed (e.g., when coregistering to a large reference DEM, it made sure the data was loaded only after reprojection). I believe this is less of an issue with the lazy loading now, but we could check this. - one of the aspects was to automatically generate figures. With
dem_coregistrationgone, we are completely missing this functionality... so I think we should focus on this soon. - regarding filtering: most of @rhugonnet's argument make sense, but I believe having a default outlier filter, as was offered by
create_inlier_maskwould still be useful for many users. Yes, it can done only with a few lines of code, but so are many functionalities that we provide and working with objects that can be boolean masks, masked arrays or nan arrays can be confusing for many users. Yes, the filter is not generic and does not work in case of large misalignment, but would probably work for 90% of our users. And btw, the default coregistration is Nuth & Kaab, which will similarly not work in those cases, so I don't see any inconsistency.
In short, yes why not delete these functions, but I think they should be quickly replaced by default workflows that can be called through a CLI with default filtering, coregistration, plots and statistics calculations.