parca
parca copied to clipboard
Metastore key/value distribution
I just did a small analysis of a Parca server's metastore that was running on a relatively small Kubernetes cluster for about 24 hours with lots of deployments happening.
| Type | Value |
|---|---|
| Stacktraces | 3 209 744 |
| Locations | 2 268 828 |
| LocationLines | 1 972 027 |
| Unsymbolized | 296 791 |
| Functions | 41 655 |
| Mappings | 144 |
| All | 7 789 189 |
At ~8GB database size this means each object is on average ~1.027kb.
I keep going back and forth in my head whether there is anything to change here, but I think the numbers and their distribution were interesting so I wanted to share.
A couple of noteworthy observations:
- The number of mappings is tiny, and they're really the only ones we care about in terms of strong consistency. We should easily be able to cache them heavily.
- Functions are 10000x less than locations. While I expected them to be less, the order of magnitude was somewhat surprising to me.
The number of stacktraces is not surprising since they essentially are the unique combination of locations, but perhaps that's why we should abandon them one day.
Something that I think would be interesting to analyze further:
- how many of the locations were locations with no mapping, therefore could never be symbolized and were probably a fault in data collection. If this number is fairly high (>5%), then it likely makes the number of stacktraces explode, and is probably unnecessary to be stored without providing much value.
After a quick analysis, it appears that 1 373 068 (~60% of all locations; 17% of all keys) of those locations don't even have mappings. This makes me think that Parca Server should probably replace these locations with a single generic "unknown" location.
After a quick analysis, it appears that 1 373 068 of those locations don't even have mappings. This makes me think that Parca Server should probably replace these locations with a single generic "unknown" location.
👍 👍 👍 Totally. Without a mapping, we can't do anything for those.
Based on this data, I'm also going to have a look at denormalizing LocationLines, and instead of having a separate object for it, I'm going to include them in Location objects directly.