parca icon indicating copy to clipboard operation
parca copied to clipboard

Metastore key/value distribution

Open brancz opened this issue 3 years ago • 3 comments

I just did a small analysis of a Parca server's metastore that was running on a relatively small Kubernetes cluster for about 24 hours with lots of deployments happening.

Type Value
Stacktraces 3 209 744
Locations 2 268 828
LocationLines 1 972 027
Unsymbolized 296 791
Functions 41 655
Mappings 144
All 7 789 189

At ~8GB database size this means each object is on average ~1.027kb.

I keep going back and forth in my head whether there is anything to change here, but I think the numbers and their distribution were interesting so I wanted to share.

A couple of noteworthy observations:

  • The number of mappings is tiny, and they're really the only ones we care about in terms of strong consistency. We should easily be able to cache them heavily.
  • Functions are 10000x less than locations. While I expected them to be less, the order of magnitude was somewhat surprising to me.

The number of stacktraces is not surprising since they essentially are the unique combination of locations, but perhaps that's why we should abandon them one day.

Something that I think would be interesting to analyze further:

  • how many of the locations were locations with no mapping, therefore could never be symbolized and were probably a fault in data collection. If this number is fairly high (>5%), then it likely makes the number of stacktraces explode, and is probably unnecessary to be stored without providing much value.

brancz avatar Jul 29 '22 12:07 brancz

After a quick analysis, it appears that 1 373 068 (~60% of all locations; 17% of all keys) of those locations don't even have mappings. This makes me think that Parca Server should probably replace these locations with a single generic "unknown" location.

brancz avatar Jul 29 '22 13:07 brancz

After a quick analysis, it appears that 1 373 068 of those locations don't even have mappings. This makes me think that Parca Server should probably replace these locations with a single generic "unknown" location.

👍 👍 👍 Totally. Without a mapping, we can't do anything for those.

kakkoyun avatar Jul 29 '22 14:07 kakkoyun

Based on this data, I'm also going to have a look at denormalizing LocationLines, and instead of having a separate object for it, I'm going to include them in Location objects directly.

brancz avatar Aug 01 '22 13:08 brancz