profiler icon indicating copy to clipboard operation
profiler copied to clipboard

Implement Method Table view

Open parttimenerd opened this issue 3 years ago • 31 comments

This PR implements a CallTree based view that presents all methods with their total and self times:

image

It implements a simplified version of the view first suggested in #15 (Add a "top functions" call tree view) and reiterated in #4205.

It supports all the amenities of the CallTree view like the side bar or highlighting the samples in the timeline.

Sample profile

Deployment preview

As in other PRs: This PR has not yet any tests, but it is manually tested and I'll add the tests after I'm sure what the final implementation is. The tests should be really similar to the tests for the CallTree view itself, as it is essentially the same view just with a different source of data.

┆Issue is synchronized with this Jira Task

parttimenerd avatar Sep 09 '22 08:09 parttimenerd

Codecov Report

Base: 88.56% // Head: 87.91% // Decreases project coverage by -0.64% :warning:

Coverage data is based on head (c098615) compared to base (64cdd7e). Patch coverage: 34.22% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4227      +/-   ##
==========================================
- Coverage   88.56%   87.91%   -0.65%     
==========================================
  Files         282      283       +1     
  Lines       25333    25619     +286     
  Branches     6817     6886      +69     
==========================================
+ Hits        22435    22523      +88     
- Misses       2696     2865     +169     
- Partials      202      231      +29     
Impacted Files Coverage Δ
src/actions/profile-view.js 89.48% <0.00%> (-0.29%) :arrow_down:
src/app-logic/tabs-handling.js 100.00% <ø> (ø)
src/components/app/Details.js 85.00% <ø> (ø)
src/components/sidebar/index.js 100.00% <ø> (ø)
src/profile-logic/call-tree.js 62.40% <3.84%> (-31.51%) :arrow_down:
src/reducers/profile-view.js 94.24% <12.50%> (-2.04%) :arrow_down:
...rc/components/calltree/ProfileFunctionTableView.js 40.00% <40.00%> (ø)
src/components/app/BottomBox.js 42.64% <50.00%> (+0.22%) :arrow_up:
src/selectors/per-thread/index.js 83.96% <50.00%> (-12.20%) :arrow_down:
src/profile-logic/profile-data.js 91.17% <56.57%> (-2.25%) :arrow_down:
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Sep 09 '22 08:09 codecov[bot]

Very nice! It seems to work correctly.

I have a few comments, about the naming and about the implementation.

First off, using the word "method" in "Method Table" is a bit Java-centric. In other languages, not every piece of executable code is a method. So I would propose a different name instead: "Function List".

Furthermore, I think the implementation should not make use of the CallNodeInfo type. Because we're dealing with a flat list here, I think it's better to avoid all use of "tree" and "node" terminology. Instead, you could put your information into a new FuncListInfo struct.

Also, the current implementation does a lot of work when the preview selection changes, because it iterates over all samples and walks the entire stack for each sample (doing repeated work if the same stack is present in multiple samples), and it allocates a fresh Set object for every sample. See my feedback in this comment: https://github.com/firefox-devtools/profiler/pull/2388#pullrequestreview-356178651

Independent of the samples, compute the "does stack N contain func M" cache so that it can be memoized and reused for different preview selections. And then have a selector that combines the preview selection filtered samples with the cache and returns the times per function for the current selection.

I'm not sure what's the best way to represent the "does stack N contain func M" cache. It feels like a sparse boolean matrix. I guess you could use an array of sets, i.e. one set for each stack, where the set contains the funcs in that stack. But maybe there's a more efficient data structure for this.

mstange avatar Sep 13 '22 15:09 mstange

I have thought about this more and have come to the conclusion that my advice about the cache was probably misguided. Computing a set per stack upfront is not necessarily going to be better, in fact it could even lead to wasted work because there are many stacks which aren't directly used by samples and only exist as ancestor nodes. And computing a set per stack would mean that all those sets need to be retained, whereas your solution only needs one set at a given time. So, in conclusion, your _getSummaryPerFunction approach seems fine. We can revisit this if it ever shows up as a bottleneck.

mstange avatar Sep 13 '22 16:09 mstange

First off, using the word "method" in "Method Table" is a bit Java-centric. In other languages, not every piece of executable code is a method. So I would propose a different name instead: "Function List".

I totally forget that, thanks for catching this.

Furthermore, I think the implementation should not make use of the CallNodeInfo type. Because we're dealing with a flat list here, I think it's better to avoid all use of "tree" and "node" terminology. Instead, you could put your information into a new FuncListInfo struct.

That would really complicate the implementation. My implementation uses the CallTree and Sidebar component (and has therefore to use the CallNodeInfo type) to reuse most of the existing functionality.

So replacing the types is only really possible by using an alias to CallNodeInfo in all FunctionList specific places. This should clear most of the confusion.

parttimenerd avatar Sep 13 '22 16:09 parttimenerd

It supports all the amenities of the CallTree view like the side bar or highlighting the samples in the timeline.

Oh, I forgot to comment on this.

One thing I noticed that isn't supported yet is the context menu.

As for the timeline highlighting, it doesn't seem to be highlighting based on the selection in the function list. The highlight seems to be based on the selected node in the call tree tab, i.e. based on a selection that's invisible when the function list is shown. And clicking the graph in the timeline does not change which row in the function list is selected. However, that's fine for now, because it's unclear to me how it should work in the ideal case. So this can be worked on as a follow-up. I just wouldn't call it "supported" in the current state.

  • Which function should be highlighted when the graph is clicked?
    • Each pixel in the graph belongs to a sample, which has a stack, which consists of many functions. Which of the functions in the stack should we select? Maybe the leaf function?
  • Which parts of the graph should be highlighted when a function is selected?
    • We should highlight the pixels in the graph which belong to samples whose stack contains the selected function. However, this is tricky to achieve with the current graph drawing code: For each category fill, the graph currently only supports one "selected" area. That's because, in the call tree, the selected subtree of the call tree always corresponds to a "connected area" in the graph, due to how stacks in the graph are sorted - they're sorted in call tree DFS traversal order. But in the function list, when a function is selected, this function can be present in many stacks which are in disjoint subtrees of the call tree. So the "selected samples" will not form one connected area in the graph. So the graph drawing needs to support multiple disjoint selected areas within a single category fill.

mstange avatar Sep 14 '22 17:09 mstange

I agree with you that all of this is to tricky for this small PR and I fear that these discussions will prevent this PR from ever getting merged.

The side bar works nonetheless, which is really important.

parttimenerd avatar Sep 14 '22 17:09 parttimenerd

@mstange I fixed all your suggestions

parttimenerd avatar Sep 21 '22 08:09 parttimenerd

I hope to look at this later this week.

While testing this, I found a profile which shows sluggish preview selection handling: https://deploy-preview-4227--perf-html.netlify.app/public/kva8z3a82gvp50t0xw49re353y90atzs6bka2dr/function-table/?globalTrackOrder=0&hiddenLocalTracksByPid=14432-01&localTrackOrderByPid=14432-01&thread=0&v=7

Maybe this example can help us optimize this view. Dragging the mouse in the timeline is much smoother in the Call Tree tab than it is in the Function Table tab.

mstange avatar Sep 27 '22 20:09 mstange

Thanks for the sample. I have a few ideas how to solve this (and will try to test them tomorrow), they also are based on caching information so that less information is computed on every change. I think the lag for small (< 3s or so) selections is quite good, but everything above is problematic. There are two ideas that come into my mind:

  1. only compute the function table for the newly selected time range portion (since the last refresh)
  2. precompute the method table for every 1000 (or so) samples and only recompute for the start and the end of the selection

These two ideas combined should result in larger memory usage for a drastically reduced number of (expensive) method table computations from samples.

I'll try to implement these and check with the profiler whether it worked. But I'm hopeful and it is far less complicated than previous ideas.

(One remaining problem could be the sheer number of methods. But this could be solved by storing the functions with one or two usages more efficiently.)

parttimenerd avatar Sep 27 '22 21:09 parttimenerd

My new version improves of the computation uses a multi-level cache and trades memory for performance: Firefox Profiler uses around 10 - 15 % more memory with this change. But the parameters of the cache can be configured.

The cache is generic and could be used for speeding up the computation callees and callers in a yet to be proposed feature.

parttimenerd avatar Sep 28 '22 16:09 parttimenerd

Still haven't found time to look at this in detail, hoping for this week.

In the new deploy preview, dragging a selection now works very fast, nice! I noticed that it's still sluggish if the mouse is over the marker track while it's being moved, but that's completely unrelated to this PR. I wonder if the sluggish behavior I saw previously was due to the same issue - I hope I didn't accidentally blame this PR for a sluggish marker track.

With the cache, do you mean selectedFunctionTableNodeSelectors?

It looks like graph highlighting is working nicely now, though I don't really understand how it can work - how did you address this case I mentioned in a previous comment?

So the "selected samples" will not form one connected area in the graph.

One bug I found is that double-clicking a function will not scroll the source view to the clicked function. For example, opening the source view on js::GCMarker::traverse(JSString*) displays the markAndTraverse method instead. My hunch is that the "selected call node" is interpreted in the wrong way.

I also noticed that the search filter's current behavior is not very intuitive for the function table view. I would expect it to filter out all functions that don't match the search string. At the moment, it still displays functions that are present elsewhere in the matching stacks. But this is a bigger change and should be deferred for a separate PR.

mstange avatar Oct 04 '22 05:10 mstange

In the new deploy preview, dragging a selection now works very fast, nice! I noticed that it's still sluggish if the mouse is over the marker track while it's being moved, but that's completely unrelated to this PR. I wonder if the sluggish behavior I saw previously was due to the same issue - I hope I didn't accidentally blame this PR for a sluggish marker track.

The remaining sluggishness comes from a too primitive implementation in the call tree, combined with using React events which are triggered not that regularly.

With the cache, do you mean selectedFunctionTableNodeSelectors?

I mean the SummaryCacheTreeNode class and related (see https://github.com/firefox-devtools/profiler/pull/4227/files#diff-f59d84bde204d9e3751ca53e860796d9da5b948173c561109fa4337f88b6a238R617). This is used to build a multilevel cache.

It looks like graph highlighting is working nicely now, though I don't really understand how it can work - how did you address this case I mentioned in a previous comment?

It was not that hard: I just walk over all stacks and look for the selected function where I previously just looked at the top frame. See https://github.com/firefox-devtools/profiler/pull/4227/files#diff-7a963fec8fbae1d005d4cbb324ff7399f707c667f19fc79fef0ceada360a4d1fR531

I also noticed that the search filter's current behavior is not very intuitive for the function table view. I would expect it to filter out all functions that don't match the search string. At the moment, it still displays functions that are present elsewhere in the matching stacks. But this is a bigger change and should be deferred for a separate PR.

This is the same for the call tree. I find searching pretty unintuitive in the whole profiler UI. I will tackle this in an upcoming proposal: I want to be able to specify for detailed filters based on glob patterns, like @function toString which matches all toString functions.

parttimenerd avatar Oct 04 '22 07:10 parttimenerd

One bug I found is that double-clicking a function will not scroll the source view to the clicked function. For example, opening the source view on js::GCMarker::traverse(JSString*) displays the markAndTraverse method instead. My hunch is that the "selected call node" is interpreted in the wrong way.

It works for me for almost all examples correctly, so I rather fix it in a later PR. I don't think your particular case has anything to do with my implementation.

parttimenerd avatar Oct 05 '22 09:10 parttimenerd

It looks like graph highlighting is working nicely now, though I don't really understand how it can work - how did you address this case I mentioned in a previous comment?

It was not that hard: I just walk over all stacks and look for the selected function where I previously just looked at the top frame.

Oh, you took the easy way out and made it highlight the same areas for different samples, by always sorting "selected" before "unselected". Here's an example profile where you can compare the highlighting of the functions "Left" and "Right" between the Call Tree and the Function Table tabs. In your current implementation, both "Left" and "Right" highlight the same areas, even though they correspond to different samples. That's ok for now, don't worry about it. But I wanted to make sure that you understood the issue that I'm referring to.

mstange avatar Oct 11 '22 22:10 mstange

One bug I found is that double-clicking a function will not scroll the source view to the clicked function. For example, opening the source view on js::GCMarker::traverse(JSString*) displays the markAndTraverse method instead. My hunch is that the "selected call node" is interpreted in the wrong way.

I think this is because BottomBox isn't adjusting its selectedCallNodeLineTimings for the function table. To fix this bug, the BottomBox would need to use selectedFunctionTableNodeSelectors.getSourceViewLineTimings when the function table is visible. It currently always uses selectedNodeSelectors.getSourceViewLineTimings:

    selectedCallNodeLineTimings:
      selectedNodeSelectors.getSourceViewLineTimings(state),

mstange avatar Oct 17 '22 13:10 mstange

I still need to read the "summary" code. I'm getting there, slowly.

I have one more general request: Please use JavaScript arrays for the new arrays you're adding, not Typed Arrays such as Int32Array. The typed arrays may have slightly better performance in some cases, but they're more complicated to deal with because they can't contain null and because you can't specify the Flow type of the array element. We still have a fair amount of them in the profiler code but we've been slowly phasing them out and are using regular JS arrays in new code. We haven't observed any adverse performance from them.

So please change your new arrays into regular JS arrays and use null instead of -1 in them.

mstange avatar Oct 17 '22 14:10 mstange

Good to know, I'll change them soon.

parttimenerd avatar Oct 17 '22 15:10 parttimenerd

So please change your new arrays into regular JS arrays and use null instead of -1 in them.

The problem with this is that my new code is intertwined with lot's of old code which uses these typed arrays and passes them into my code or expects it. I'm not too inclined to do a major refactoring of the other code in this PR. But I'm open to do this in another PR if need be.

parttimenerd avatar Oct 19 '22 14:10 parttimenerd

I'm happy for any feedback.

parttimenerd avatar Oct 26 '22 08:10 parttimenerd

I have read the summary cache tree code now. Let me try to summarize what it does: It reduces the time taken to compute function timings for a sub-range of the samples, by re-using previous calculations over other sub-ranges of samples. This way, when the user drags a preview selection, we ideally only need to iterate over the newly-added samples (and add their timings into to the timings of existing cached "summary nodes") instead of iterating over all samples in the selection, when computing the timings. It's a neat idea, and I hadn't thought of it before. But I think it's a bit complicated, and it also has a certain memory cost. I think it would be better to find other ways to reduce the cost of the calculation, so that iterating over all samples is cheap enough.

For now, can you remove the summary cache tree again? I think performance is good enough without it. And then, as a follow-up, we can improve the performance in other ways.

Here's how I would reduce the cost of what's currently happening in _getSummaryPerFunctionSlice, as follow-ups:

  • Optimize the case where the same stack is used in multiple samples, by aggregating the weight per stack index first.
  • Now there are three sources of slowness in the computation:
    1. Duplicated work from "similar but different" stacks, i.e. stacks which consist of the same functions but which have different stack indexes.
    2. Accessing the function index still goes through the frame table.
    3. For each stack, to de-duplicate functions in the stack, we have to build up a set.
  • To optimize 1 and 2, we could use the call node info from the regular call tree. It has a tree structure where "stacks which consist of the same functions in the same order" have already been collapsed into a single call node. So the same-stack optimization would get a better hit rate. The call node info also has a direct access to each node's function index, saving the indirection via the frame table.
  • To optimize 3, we could build a transformed call node info which eliminates call nodes for functions which are already present in the stack. Specifically, any node whose function is already present in an ancestor of this node would get "merged-away" into its parent node. Building this transformed (de-duplicated) call node info would probably be somewhat expensive - I don't think there's a way to do it without creating a set of functions per node - but we'd only need to do it once. And then during the sample computation we could rely on the fact that this call node info will never show the same function twice in the same stack, so we can just "walk the stack and add the weight" without building up a set.

The last optimization idea sounds quite complicated and is probably not worth doing, but I wanted to throw it out there.

mstange avatar Oct 31 '22 17:10 mstange

It's a neat idea, and I hadn't thought of it before. But I think it's a bit complicated, and it also has a certain memory cost. I think it would be better to find other ways to reduce the cost of the calculation, so that iterating over all samples is cheap enough

How about using this version and implementing the other optimizations as another PR, so that this PR is not stalled? I'm willing to commit to further improve the Function Table going forward.

For now, can you remove the summary cache tree again? I think performance is good enough without it. And then, as a follow-up, we can improve the performance in other ways.

The performance is almost unusable for larger profiles (and my use case has these a lot)...

parttimenerd avatar Oct 31 '22 17:10 parttimenerd

It's a neat idea, and I hadn't thought of it before. But I think it's a bit complicated, and it also has a certain memory cost. I think it would be better to find other ways to reduce the cost of the calculation, so that iterating over all samples is cheap enough

How about using this version and implementing the other optimizations as another PR, so that this PR is not stalled?

Yes of course, that's what I meant with "as follow-ups".

For now, can you remove the summary cache tree again? I think performance is good enough without it. And then, as a follow-up, we can improve the performance in other ways.

The performance is almost unusable for larger profiles (and my use case has these a lot)...

Oh, really? I hadn't realized that. That's unfortunate.

Ok, then let's keep the tree for now, but please document it. The comments for the cache tree should say which use case is being optimized, and roughly how it works. Furthermore, the invariants of the tree fields should be documented, i.e. what's the sum of what, which sample index ranges are nested / overlapping / adjacent, what it means for children to be null, etc.

mstange avatar Oct 31 '22 18:10 mstange

I've added the requested documentation (the field invariants are stated in the documentation of each field).

Is this now ready for adding tests? Any recommendations on what tests I should add?

parttimenerd avatar Nov 02 '22 10:11 parttimenerd

Hey @mstange, do you think you would have the chance to look at this again?

julienw avatar Nov 17 '22 12:11 julienw

Oh, Jeff noticed a bug when testing the deploy preview: When a preview selection is applied, the percentages are currently of the non-preview-filtered thread, which is inconsistent with the call tree tab.

mstange avatar Nov 29 '22 03:11 mstange

I rebased it on the current main branch.

parttimenerd avatar Dec 21 '22 14:12 parttimenerd

Oh, Jeff noticed a bug when testing the deploy preview: When a preview selection is applied, the percentages are currently of the non-preview-filtered thread, which is inconsistent with the call tree tab.

I fixed it, it was probably just forgotten.

parttimenerd avatar Dec 21 '22 18:12 parttimenerd

I noticed another issue: the transform shortcuts work in the function table, but they don't target the right node index. Probably you need to change handleCallNodeTransformShortcut. I can see that you could split it in 2 parts, so that one part is specific to panels (finding the funcIndex etc), and one part is generic (handling the actual keys).

But taking a step back, I feel like you're trying a bit too hard to reuse existing code, instead of starting from scratch for this panel. But I think I need to think about it more. We don't want to copy paste everything either, especially the call-tree part. I don't like much how you call the new structure also a call node table, I think this is confusing things.

I need to think a bit more.

julienw avatar Jan 16 '23 18:01 julienw

(taking this out of my queue for now)

flodolo avatar Aug 02 '23 10:08 flodolo

This PR seems to have stalled.

parttimenerd avatar Aug 02 '23 10:08 parttimenerd