Implement PEP 799 – A dedicated profiling package for organizing Python profiling tools
Issue to collect all PRs for the implementation of PEP-799
Linked PRs
- gh-138142
- gh-138389
- gh-139216
- gh-140156
- gh-141813
- gh-141897
- gh-141900
- gh-141912
- gh-141934
- gh-142116
- gh-142137
- gh-142157
- gh-142288
- gh-142360
- gh-142382
- gh-142394
- gh-142425
- gh-142561
- gh-142590
- gh-142601
- gh-142609
- gh-142614
- gh-142636
- gh-141533
- gh-142638
- gh-142647
- gh-142592
- gh-142676
- gh-142677
- gh-142730
- gh-142772
- gh-142841
Not sure if this is the best place to discuss the name, but I'm not sure if "tracing" is the best new name for cProfile.
First of all, I think it's a great idea to organize all of our profilers and make it a new place to host other kinds of profilers, but "tracing" actually has its own meaning (as far as I understand).
The more accurate concept against "sampling" is probably "deterministic" (and I know it's long and difficult to spell), which means we record at all function entries and exits.
"tracing" on the other hand, often implies that we have not only the accumulated data for function calls, but also the information of each call - so that we can visualize it on a timeline. If you google tracing profiler, Perfetto might be one of the tools that catch your eyes - that's a real tracing tool (or chrome tracing which is its predecessor). VizTracer is a tracing profiler for Python. It generates a tracing graph, which looks like a flamegraph but it's not.
cProfile does not have the capability. It can only accumulate data and summarize it. Calling it a tracing profiler might confuse people (and might take the position of an actual tracing profiler we want to add in the future).
From another perpective, cProfile does not use sys.settrace, which is another "trace" concept that exists in CPython already.
Do you think we should come up with another name for cProfile in the new package?
Not sure if this is the best place to discuss the name, but I'm not sure if "tracing" is the best new name for cProfile.
First of all, I think it's a great idea to organize all of our profilers and make it a new place to host other kinds of profilers, but "tracing" actually has its own meaning (as far as I understand).
The more accurate concept against "sampling" is probably "deterministic" (and I know it's long and difficult to spell), which means we record at all function entries and exits.
"tracing" on the other hand, often implies that we have not only the accumulated data for function calls, but also the information of each call - so that we can visualize it on a timeline. If you google tracing profiler, Perfetto might be one of the tools that catch your eyes - that's a real tracing tool (or chrome tracing which is its predecessor). VizTracer is a tracing profiler for Python. It generates a tracing graph, which looks like a flamegraph but it's not.
cProfile does not have the capability. It can only accumulate data and summarize it. Calling it a tracing profiler might confuse people (and might take the position of an actual tracing profiler we want to add in the future).
From another perpective, cProfile does not use sys.settrace, which is another "trace" concept that exists in CPython already.
Do you think we should come up with another name for cProfile in the new package?
I completely understand your concerns about the naming, and thank you for taking the time to provide this detailed technical feedback. You make good points about the distinction between “tracing” and what cProfile actually does.
Although I kind of disagree I think you are also right that “tracing” can be ambiguous here and indeed it may imply that we are referring to timeline-based profiling that captures individual call information The potential confusion with sys.settrace is also a valid concern although I think it doesn't apply a lot here as is an implementation aspect in any case. But is an interesting point.
My personal opinion here is that although what you say is true the distinction is too nuanced to matter. Deterministic profiler is also not a great name IMHO as if we are a real pure tracing profiler that would also be deterministic, which would also be confusing.
I’m definitely open to considering this feedback and think it merits some discussion. The technical accuracy of our naming is important, both for clarity and to avoid blocking future actual tracing profiler implementations.
However, as you noted, the timing is challenging since PEP 799 has already been accepted, which would mean we’d need to go through the process of modifying the PEP if we decide to change the name. That said, getting the naming right is important enough that it may be worth that effort.
I think this needs input from the docs working group at least. We can keep the discussion here or maybe open another one in discuss. Do you have any preference?
CC @AA-Turner @hugovk @StanFromIreland
Thanks for the ping @pablogsal.
Not sure if this is the best place to discuss the name, but I'm not sure if "tracing" is the best new name for cProfile. ...
I completely understand your concerns about the naming, and thank you for taking the time to provide this detailed technical feedback. You make good points about the distinction between “tracing” and what cProfile actually does.
My personal opinion here is that although what you say is true the distinction is too nuanced to matter. Deterministic profiler is also not a great name IMHO as if we are a real pure tracing profiler that would also be deterministic, which would also be confusing.
From what I can see from a quick search, tracing, instrumen{ted,ation}, deterministic, and call graph seem to be used as near-synonyms. This contrasts with sampling profiling, which seems to use the same term. There doesn't seem to be an obvious and consistent bright-line distinction I can find in third-party documents for what distinguishes a 'proper' tracing profiler.
Although I kind of disagree I think you are also right that “tracing” can be ambiguous here and indeed it may imply that we are referring to timeline-based profiling that captures individual call information The potential confusion with sys.settrace is also a valid concern although I think it doesn't apply a lot here as is an implementation aspect in any case. But is an interesting point.
The technical accuracy of our naming is important, both for clarity and to avoid blocking future actual tracing profiler implementations.
If a future Python version implemented support for tracing profiling in the VizTracer/Perfetto form, would we still keep the current (cProfile) implementation of profiling.tracing? A possible/likely scenario would be that we expose the higher fidelity information under the existing profiling.tracing name, possibly with a compat layer. If so, I don't think there is a risk to using the tracing name now, as it doesn't prevent future improvements/features.
A
I propose the name profiling.boring for cProfile.
I propose the name
profiling.boringfor cProfile.
I think it's possible that people are not that familiar with tracers so they think tracers are very similar to profilers. From my point of view, a tracer is distinctly different than a profiler. A tracer can be used for profiling (or, as a profiler), but it normally provides much more detailed information.
OpenTelemetry is probably the most popular cross-language tracing framework. One of the core concepts is a "span", meaning an event with a start/end pair timestamp - which a profiler normally does not provide.
All the results from my search on Google for "tracing profiler" are some kinds of profiler(tool) that provides a real tracing capability(Android, Google Cloud, Perfetto, dotNet, torch, etc.). I did not find any counter-example with my limited efforts.
A tracer (tracing profiler) has its own limitation - because it records too much information, it has to compromise on either the functions it records, or the length of the program. So we can't replace cProfile with a tracing profiler like VizTracer.
In my opinion, profiler.boring is a slightly better name than profiler.tracing :).
What about a similar but more official name profiler.basic? cProfile is not only instrumented, but also provides less information on the call frame compared to our new sampling profiler. It only has information of it caller (not the full frame). basic might be a good name to drive people away from using it when they need more advanced features.
a tracer is distinctly different than a profiler
In this case, would a (hypothetical) future stdlib tracing implementation be considered for the profiling package/namespace?
I would be -1 on a name like profiling.basic, because it doesn't describe what the profiler does, only what we have decided its capabilities are. As Pablo/PEP 799 said, there remain valid use cases for the cProfile based implementation. I would also hope that in the future, we will have improved profiling.tracing to have more information such that is is no longer 'basic'.
A
In this case, would a (hypothetical) future stdlib tracing implementation be considered for the
profilingpackage/namespace?
Yes, I don't think is unreasonable
What about a similar but more official name
profiler.basic?
Yeah I agree this is probably a suboptimal name because it looks like is a "worse" version by nature and doesn't imply the tradeoffs.
I would be more than happy if stdlib provides a real tracing profiler. I don't see a reason why can't that be achieved.
Do we plan to improve cProfile? Because that might introduce some backwards compatibility issue. We probably can add some extra fields to pstats so it can still be compatible. But if the option is to keep cProfile and build a new one, we could use name like profiler.legacy or even profiler.cProfile.
Maybe something closer to the nature that it records the function calls? profiler.call or profiler.fee or profiler.function? Sampling profiler does not give exact call counts for the program which is what cProfile can do. If we decided to use profiler.tracing, I wouldn't be mad. I just think we should at least discuss the possible names before we land on this potentially ambiguous name.
Maybe something closer to the nature that it records the function calls?
profiler.callorprofiler.feeorprofiler.function? Sampling profiler does not give exact call counts for the program which is whatcProfilecan do. If we decided to useprofiler.tracing, I wouldn't be mad. I just think we should at least discuss the possible names before we land on this potentially ambiguous name.
I am happy to keep discussing this a bit more but just to have all of us aligned: the SC already approve this name and the discussion on the PEP already happened so technically that discussion already happened. This point was indeed not discussed there so I am happy discussing it here a bit more, but as you can see there is not even consensus on the alternative name so is unclear to me if we will arrive to any satisfactory conclusion. I am happy to try of course.
In any case I am not going to hold the PR on this as we can always easily rename profiling.tracing if we want to
Just chiming here, but we need to update the docs as we document this to be under profile.XXX in the What's New page. As the docs are continuously built, it's a bit annoying if this typo remains.
As a stop-gap I've created #138389.
A
I know docs are still pending, but a brief What's New would be useful at least for the alpha releases.
I know docs are still pending, but a brief What's New would be useful at least for the alpha releases.
There is already a preliminary one here:
https://docs.python.org/3.15/whatsnew/3.15.html#high-frequency-statistical-sampling-profiler