Add profiling module
Resolves https://github.com/GlacioHack/xdem/issues/754
This PR proposes an integration of a profiling module in order to better monitor functions that are costly in terms of memory and execution time.
To achieve this, we drawn inspiration from the profiling tool implemented in Pandora (source here). The choice to use this method (with (psutil) over severals others was discussed in the issue. In brief, it is a good compromise between simplicity and all the metrics that we mainly need.
Several adaptations and improvement was done to tailor the profiler to the xDEM context :
- Activation via a function, not a configuration file.
Profiler.enable(save_graphs: bool = False, save_raw_data: bool = False) -> None
save_graphs: two kinds of .html graphs will be savec :
an icicle graph, showing the time spent in each step of the pipeline
a plot for each decorated functions used, showing the memory consumption of xDEM at regular intervals during the execution
ex: my_program function decorated gives :
save_raw_data: save the raw data (dafaframe) on calls as a .pickle file
-
Adding the end of each profiled function to the graph
-
Retrieve the dataframe with the information + metrics + parents:
Profiler.get_profiling_info(function_name: str = None) -> pd.DataFrameex: my_program function decorated gives :
- Reset df info: reset()
Profiler.rester() -> None
#######################
Tests added : test_profiling_configuration in test_profiling.py with combinaison of configurations (save_graphs:/save_raw_data) and a list of decorated functions to run (load, one attribute like slope and one coreg with fit_and_apply)
Doc: add in comment Deps: added psutil and plotly dependencies only in dev mode (in dev-environment.yml)
TODO
Resolve mypy problem when # type: ignore not add at the end of the profile decorator :
"Untyped decorator makes function "xxx" untyped."
Where can we post our results for them to be public?
And I also have a question: How do we expect the profiling to work for parallelized functions (whether with Multiprocessing or Dask)?
Regarding the choice of using the Plotly library, I think it should be kept because it offers the possibility of having interactive figures that matplotlib does not. Here small example of a profiled program to understand why it can be usefull :)
time_graph.html memory_my_program.html
Tell me what you think @rhugonnet
@marinebcht That makes sense, otherwise many steps can't be visualized properly because the labels become too small. :sweat_smile:
Let's add Plotly as an optional dependency! We'll also move Matplotlib to an optional dependency soon following https://github.com/GlacioHack/xdem/issues/755, so we won't be imposing any plotting package on users.
And for the tests of optional dependencies, we can do what was discussed here (we should do it in a separate PR, but relatively soon to ensure it doesn't break the features): https://github.com/GlacioHack/geoutils/issues/731 So the two points I made above a month and a half ago are now solved. :wink:
old one, we close