Enhance performance debugging
In addition to profiling, there are some less involved performance metrics we could collect.
For example
- Simply measure wall time spent in the simulator, cocotb, and user code and print them
- Allows the user to determine if there 1. is a performance issue using cocotb, 2. if that issue is in their own testbench or the runtime
- GPI callbacks per simulation time
- This is usually a good measure of why there might be poor performance, reducing the number of callbacks into Python is ideal
Is there a way to indirectly infer the time spent switching between cocotb and the simulator, as of now? What I have tried is to use enable profiling for the python code, to enable profiling in Verilator and analyze the logs, but I still get poor performance. Specifically, if I move an highly used agent from the cocotb testbench to a SV wrapper that contains the agent + the RTL, I see real time speedups of 2-3x. However neither the python nor the verilator profiling results reflect such speedups in performance; this led me to suspect that the profiling of Cocotb and Verilator ignore completely the interaction between the two domains (in other words, the calls to the VPIs are not profiled by neither of the tools).
How can I profile the VPI calls?
Callgrind probably has features allowing you to isolate calls within cocotb's C++ libraries to see how much time is spent there, but I've never done it.