parca icon indicating copy to clipboard operation
parca copied to clipboard

feat: add metrics for debuginfo

Open ksankeerth opened this issue 2 years ago • 7 comments

Hi Parca team, I tried to fix #1275. Not sure I have covered everything mentioned in #1275. Please check and let me know if anything else to be added in this PR.

I have added metrics for..

  1. errors and attempts for uploading debuginfo
  2. errors and attempts for validation
  3. time taken for uploading debuginfo, updating metdata, and, exists check

I'm not sure what exactly I have to do for below metrics. Will you be able to guide me?

Debuginfo Client metrics (will be exposed on Agent)

Debuginfod Metrics

Fixes #1275

Signed-off-by: shankeerthan-kasilingam [email protected]

ksankeerth avatar Sep 12 '22 04:09 ksankeerth

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 12 '22 04:09 CLAassistant

Looking good. Thanks for handling this.

I think we can be more consistent with the variable naming.

And as far as I can tell, we don't increment error metrics every error returns. Let's make sure we cover everything.

You might also want to consider extracting validation logic to the helper function fro ease of instrumenting.

Thanks for reviewing the PR. I'll check the comments and update the PR

ksankeerth avatar Sep 14 '22 02:09 ksankeerth

And as far as I can tell, we don't increment error metrics every error returns.

Just some clarifications needed on this(sorry if it's a dump question). https://github.com/parca-dev/parca/blob/573a48dc7860ba24e219e3ff327ea4b799ed56ae/pkg/debuginfo/store.go#L85

https://github.com/parca-dev/parca/blob/573a48dc7860ba24e219e3ff327ea4b799ed56ae/pkg/debuginfo/store.go#L88

Two error metrics are used here. I increase the error metrics with reason before return if it's related to the metrics we measure. Is this correct? or in any places did i increase error metrics irrelevantly? (may be this location https://github.com/parca-dev/parca/blob/573a48dc7860ba24e219e3ff327ea4b799ed56ae/pkg/debuginfo/store.go#L321)

ksankeerth avatar Sep 14 '22 03:09 ksankeerth

You might also want to consider extracting validation logic to the helper function fro ease of instrumenting

If I'm going to extract validation logic to a function, Can I extract the below code inside a function in store.go? https://github.com/parca-dev/parca/blob/573a48dc7860ba24e219e3ff327ea4b799ed56ae/pkg/debuginfo/store.go#L302-L327

ksankeerth avatar Sep 14 '22 03:09 ksankeerth

@ksankeerth, I think we can get rid of additional validation attempts and error metrics. And add additional reason ("validation") to the actual action metrics (e.g upload).

kakkoyun avatar Sep 14 '22 08:09 kakkoyun

You might also want to consider extracting validation logic to the helper function fro ease of instrumenting

If I'm going to extract validation logic to a function, Can I extract the below code inside a function in store.go?

https://github.com/parca-dev/parca/blob/573a48dc7860ba24e219e3ff327ea4b799ed56ae/pkg/debuginfo/store.go#L302-L327

You can extract it. I would prefer the metadata update logic outside the function on the call site, though.

kakkoyun avatar Sep 14 '22 08:09 kakkoyun

@kakkoyun I have done following changes

  1. changed the name of metrics as per your suggestions.
  2. removed metrics for validationAttempts and validationErrors (since metrics were removed, i didn't extract metadata update logic to a separate function)
  3. metadataUpdateDuration was moved to metadata.go and used metaData.UploadFinishedAt and metaData.UploadStartedAt for the metrics. Seems race condition can happen https://github.com/parca-dev/parca/blob/aa3772c86fff44015b79668917944a259d4f5223/pkg/debuginfo/metadata.go#L179-L182 . I didn't update metadataUpdateDuration metrics in this case.

Please check the new changes.Thanks

ksankeerth avatar Sep 15 '22 05:09 ksankeerth