xdem Add profiling module

Resolves https://github.com/GlacioHack/xdem/issues/754

This PR proposes an integration of a profiling module in order to better monitor functions that are costly in terms of memory and execution time.

To achieve this, we drawn inspiration from the profiling tool implemented in Pandora (source here). The choice to use this method (with (psutil) over severals others was discussed in the issue. In brief, it is a good compromise between simplicity and all the metrics that we mainly need.

Several adaptations and improvement was done to tailor the profiler to the xDEM context :

Activation via a function, not a configuration file. Profiler.enable(save_graphs: bool = False, save_raw_data: bool = False) -> None

save_graphs: two kinds of .html graphs will be savec :

an icicle graph, showing the time spent in each step of the pipeline

a plot for each decorated functions used, showing the memory consumption of xDEM at regular intervals during the execution

ex: my_program function decorated gives :

save_raw_data: save the raw data (dafaframe) on calls as a .pickle file

Adding the end of each profiled function to the graph
Retrieve the dataframe with the information + metrics + parents: Profiler.get_profiling_info(function_name: str = None) -> pd.DataFrame ex: my_program function decorated gives :

Reset df info: reset() Profiler.rester() -> None

#######################

Tests added : test_profiling_configuration in test_profiling.py with combinaison of configurations (save_graphs:/save_raw_data) and a list of decorated functions to run (load, one attribute like slope and one coreg with fit_and_apply)

Doc: add in comment Deps: added psutil and plotly dependencies only in dev mode (in dev-environment.yml)

Aug 21 '25 09:08 marinebcht

TODO Resolve mypy problem when # type: ignore not add at the end of the profile decorator : "Untyped decorator makes function "xxx" untyped."

Aug 21 '25 11:08 marinebcht

Where can we post our results for them to be public?

Aug 22 '25 07:08 adebardo

And I also have a question: How do we expect the profiling to work for parallelized functions (whether with Multiprocessing or Dask)?

Aug 25 '25 21:08 rhugonnet

Regarding the choice of using the Plotly library, I think it should be kept because it offers the possibility of having interactive figures that matplotlib does not. Here small example of a profiled program to understand why it can be usefull :)

time_graph.html memory_my_program.html

Tell me what you think @rhugonnet

Oct 09 '25 12:10 marinebcht

@marinebcht That makes sense, otherwise many steps can't be visualized properly because the labels become too small. :sweat_smile:

Let's add Plotly as an optional dependency! We'll also move Matplotlib to an optional dependency soon following https://github.com/GlacioHack/xdem/issues/755, so we won't be imposing any plotting package on users.

Oct 09 '25 16:10 rhugonnet

And for the tests of optional dependencies, we can do what was discussed here (we should do it in a separate PR, but relatively soon to ensure it doesn't break the features): https://github.com/GlacioHack/geoutils/issues/731 So the two points I made above a month and a half ago are now solved. :wink:

Oct 09 '25 16:10 rhugonnet

old one, we close

Dec 18 '25 15:12 adebardo