parca icon indicating copy to clipboard operation
parca copied to clipboard

Improve global merge performance

Open brancz opened this issue 4 years ago • 4 comments

Global merges are already quite a bit better than they used to be, but they can still take quite a bit of time, and I feel like there is still a lot of room for improvement. I did some testing, and it is apparent that there are really two main things that are causing a lot of CPU cycles during merging:

  • Actual merging of instant profiles ~34%
  • Retrieving metadata from the metastore ~56%
  • Marshaling proto ~9%

Screenshot from 2021-11-04 13-55-08

brancz avatar Nov 04 '21 13:11 brancz

Based on the distribution above, I'm going to have a look at what we can do about retrieving metadata from storage faster. In particular, it appears that most of the time is spent in cache misses, meaning when a location has never been loaded from storage.

brancz avatar Nov 04 '21 13:11 brancz

WIP for exploring other strategies for saving metadata #433

brancz avatar Nov 10 '21 17:11 brancz

More perf work for metastore: #462

More profiling needs to be done now, but my expectation is that allocations will still dominate the metadata stacks in terms of CPU. This could be improved with memory pooling but would need some restructuring of the flamegraph messages to separate out metadata from the flamegraph tree, I'm going to look into this restructuring next.

brancz avatar Nov 24 '21 10:11 brancz

Large merges now look like this: https://share.polarsignals.com/1c4fb39/

Looks like in terms of metastore interactions, getting locations and location-lines are the biggest offenders.

brancz avatar Nov 24 '21 12:11 brancz

We've made massive improvements since opening this issue.

brancz avatar Jan 24 '23 10:01 brancz