geoutils icon indicating copy to clipboard operation
geoutils copied to clipboard

Move `filter.py` from xDEM to GeoUtils and integrate into `raster` class

Open vschaffn opened this issue 6 months ago • 1 comments

Resolves #690.

Context

The current filter.py module in xDEM provides a collection of filters that are highly useful for digital elevation models (DEMs). However, these filtering utilities are not intrinsically tied to elevation data—they are generally applicable to any geospatial raster data. Given this, it makes more architectural sense for them to live in GeoUtils.

Changes

  • Moved the filter.py module from xdem/ to geoutils/filters.py.
  • Removed gaussian_filter_cv to avoid dependence on opencv.
  • Moved tests/test_filter.py in xDEM to tests/test_filters.py in GeoUtils.
  • Added a new function _nan_safe_filter common for gaussian, mean and median filter, to deal with arrays containing nan values.
  • Added a generic _filter function mapping the available filters.
  • Added the Raster.filter method that extracts the array with Nans, uses the _filter function, and rebuild the Raster data.

Tests

  • Added new tests for the Raster.filter method.

Documentation

  • Added a filter section in raster_class.md.

Note

When this ticket is approved, a linked PR to remove filters in xDEM will be opened.

vschaffn avatar Apr 28 '25 16:04 vschaffn

I've been working on a very similar problems during the past week in https://github.com/GlacioHack/xdem/pull/486, so while I still know the intricacies of the topic, here's how we could add Numba filters in GeoUtils to dramatically improve the speed for median (and we could re-use the same code structure for nmad, nanmax, and any other nanfunc):

from typing import Literal
import numpy as np
from numba import jit, prange
from scipy.ndimage import generic_filter

@jit(parallel=True)
def median_filter_numba(dem: np.ndarray, window_size: int):

    # Get input shapes
    N1, N2 = dem.shape

    # Define ranges to loop through given padding
    row_range = N1 - window_size + 1
    col_range = N2 - window_size + 1

    # Allocate output array
    outputs = np.full((row_range, col_range), fill_value=np.nan, dtype=np.float32)

    # Loop over every pixel concurrently by using prange
    for row in prange(row_range):
        for col in prange(col_range):

            outputs[row, col] = np.nanmedian(dem[row: row+window_size, col:col+window_size])

    return outputs


def median_filter(dem: np.ndarray, window_size: int, engine: Literal["scipy", "numba"]):

    # Using SciPy, options already include edge behaviour
    if engine == "scipy":
        return generic_filter(dem, np.nanmedian, size=window_size, mode="constant", cval=np.nan)
    # Using Numba, need to define edge behaviour ourselves
    elif engine == "numba":
        # We pad the input array with NaNs for half the window size
        hw = int((window_size - 1) / 2)
        dem = np.pad(dem, pad_width=((hw, hw), (hw, hw)), constant_values=np.nan)
        return median_filter_numba(dem, window_size)


############
# TEST SPEED
############

from time import time
rnd = np.random.default_rng(42)
nb_it = 10  # Several iterations to be more precise measuring CPU time
window_size = 11
sizes = [50, 100, 500, 1000]  # Array sizes

# Call Numba once in the void to trigger JIT compile
_ = median_filter(np.ones((10, 10)), window_size=3, engine="numba")

list_df = []
for s in sizes:
    dem = rnd.normal(size=(s, s))

    # Get computation times
    t1 = time()
    for i in range(nb_it):
        filt1 = median_filter(dem, window_size=window_size, engine="scipy")
    t2 = time()
    for i in range(nb_it):
        filt2 = median_filter(dem, window_size=window_size, engine="numba")
    t3 = time()

    # Check output is equal
    assert np.allclose(filt1, filt2)

    # Print elapsed times
    print(f"Size: {s}")
    print(f"Elapsed SciPy: {t2 - t1}")
    print(f"Elapsed Numba: {t3 - t2}")
Size: 50
Elapsed SciPy: 0.24158883094787598
Elapsed Numba: 0.005011320114135742
Size: 100
Elapsed SciPy: 0.9490830898284912
Elapsed Numba: 0.012121915817260742
Size: 500
Elapsed SciPy: 23.553598880767822
Elapsed Numba: 0.3076012134552002
Size: 1000
Elapsed SciPy: 94.9611337184906
Elapsed Numba: 1.2290620803833008

Quite a striking improvement, factor of 100! :wink:

rhugonnet avatar May 07 '25 10:05 rhugonnet

Moved to #699

rhugonnet avatar Aug 27 '25 23:08 rhugonnet