Context

The goal of this ticket is to integrate a profiling module into xDEM in order to better monitor functions that are costly in terms of memory and execution time.

To achieve this, we will draw inspiration from the profiling tool implemented in Pandora (source here). However, several adaptations are needed to tailor the profiler to the xDEM context.

Tasks

[ ] Replace enable_from_config with a simplified method that does not require json_checker. The new API should look like: enable(save_graphs=True, save_raw_data=False)
[ ] Add unit tests to ensure the profiler behaves as expected
[ ] Add documentation for the new profiler module and usage guidelines
[ ] Create profiling for first functions as reprojection, subsample, interpolation

/estimate 5d

Jun 18 '25 07:06 adebardo

Thanks @adebardo, this would be an essential addition indeed! 🙂

My main feeback at this stage:

If understand correctly after looking at Pandora's code, we would rely primarily on psutil? It would be nice to explain this in the issue description directly: what new tools would be used, how (what behaviour/output we should expect in GeoUtils/xDEM) and why pick this solution.

I'm not an expert in profiling, but I have used or heard of several other tools that are widely used (each with 10,000+ stars on GitHub and a massive user-base), such as the base Python cProfiler, memray for memory profiling, scalene for memory+CPU profiling, py-spy for CPU profiling with low overhead, etc...

Given this landscape of tools, this raises the questions: Is there a reason to use psutil specifically? What are the advantages/drawbacks? I think we need much more detail to fully understand the change ahead and learn from your experience on Pandora here :wink:

Jun 18 '25 18:06 rhugonnet

Our idea here would be to implement profiling in production, with a global and easy-to-use system in case we need more detailed insights into functions. The tools you mentioned are more external and complex (we've already tried Memray, and it wasn't very conclusive for our needs). The metrics returned by psutil are more than sufficient for our requirements.

Jun 19 '25 08:06 adebardo

We tried using Scalene to profile Blockwise, but the results weren't relevant, and the process took longer (possibly due to a lack of knowledge on my part).

Jun 19 '25 08:06 adebardo

I'm trying to make a comparison table sometime next week :)

Jun 19 '25 15:06 adebardo

We want to implement a performance monitoring system for time and memory in production.

Based on our experience, we recommend using the psutils library, which is useful for tracking performance. It is a relatively lightweight and easy-to-use library.

For visualization, we suggest combining it with the plotly tool. These dependencies could be activated in an xdem mode and not loaded in a light mode for example.

This method can also, with a few modifications by the user or developper, allow for line-by-line profiling to be implemented.

Tool	Type	CPU Tracking	Memory Tracking	Detail Level (Function/Line)	Built-in Visualization	Recommended Use Case
psutil	System monitoring	Yes	Yes	No	No (combine with Plotly/Dash)	Continuous monitoring in production
Scalene	CPU + memory profiler	Yes	Yes (line by line)	Yes (line by line)	Yes (HTML report with charts)	In-depth CPU/memory diagnostics during code optimization
Memray	Deep memory profiler	No	Yes (native + Python allocations)	Yes (full stack trace)	Yes (HTML flamegraph)	Leak detection and memory spikes, detailed native allocation analysis
Py-spy	External sampling profiler	Yes (sampling)	No	Yes (flamegraph)	Yes (Speedscope, SVG)	Lightweight profiling without modifying code, suitable for production
cProfile	Standard Python profiler	Yes	No	Yes (per function)	No (use with SnakeViz, gprof2dot)	Integrated baseline profiler, good for identifying bottlenecks early
line_profiler	Line-by-line time profiler	Yes	No	Yes (line by line)	No	Precise timing for critical function sections
memory_profiler	Line-by-line memory profiler	No	Yes (line by line)	Yes (line by line)	No	Detailed memory tracking to pinpoint memory-heavy lines

Jun 30 '25 09:06 adebardo

Type of graph we can produce :

Jul 01 '25 07:07 adebardo

Type of graph we can produce :

In the second graph, is it possible to show when the application starts and ends? If I understand correctly, we only have information about when it starts, not when it finishes.

Jul 01 '25 07:07 belletva

Type of graph we can produce :

In the second graph, is it possible to show when the application starts and ends? If I understand correctly, we only have information about when it starts, not when it finishes.

Ye it's possible to add this functionnality by measuring the returning time thks to "return_time" from psutils

Jul 01 '25 08:07 adebardo

I agree this would be a nice tool to have! I don't have much experience with profiling, so I trust your experience on this. Of course, it would be useful to be able to know specifically which lines have a long running time or memory usage, but if other Python packages are not well suited, let's go with psutils.

Jul 30 '25 08:07 adehecq

Add profiling module

Context

Tasks