semantic-conventions icon indicating copy to clipboard operation
semantic-conventions copied to clipboard

Add VCS metrics from Github receiver

Open christophe-kamphaus-jemmic opened this issue 1 year ago • 7 comments

Fixes #1372

Changes

This PR adds the VCS metrics from the GitHub Receiver to Semantic Conventions. These metrics are based on https://github.com/adrielp/opentelemetry-collector-contrib/blob/903eec5382352e4d663fadd2dc7f1cc1e37f62fd/receiver/githubreceiver/documentation.md

Merge requirement checklist

TODO

  • [x] Update docs/cicd/cicd-metrics.md after receiving changes from https://github.com/open-telemetry/semantic-conventions/pull/1411

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: christophe-kamphaus-jemmic / name: Christophe Kamphaus (a5eff73d664e95959f1574b021e37bbed98bfcd1, 163e4989ad094234c55a266fb9c4e736e9f34a4b, 41543cdc9d4d76f9327d8689acfacc86c70c637a, 4d8e3a9c75fa66e4a22952c98c9087fed6b2592a, ef91dfa616bd776ac951df1827429b027a82735b, 30c8fdb6f0d5e2778f14c6fc6527df86b1722b63, e174ef3572be947aa46d638168c8c287e6e32536, cfcdaf1ac3a1887af58cdcadb40863d660b64a82, 347b15b0db0a0451f0ae696d949b7d4537819e87, 700412af4b8293559d312afd44ee939b32d4374b, 5a9ffe7ef997110472448ddf519289cd0a34a85f, c232041cf70354695878b9f6bab8a83000934e98, aab9f90b5944da890db2c82a00871a0d1430bf44, f71231217f6607fb0d7a1a212b37b578ce922c9d, d2ce62f554bb4f6eef08014de2ad10455d568462, 2da96a53e779912c7371c76184d18d4a268a3a37, 8c2e8a757b7203e04c3fe63bc1a705bafb0f19ef, afaa1f5bffd0d548e50ddb664eff5175723d2879, 592483f1530ad57334cf9057db1734349482834b, 9bd4a958315ba58893b88efd597e1d1fb2dac087, 344eeedbeda8a77ef258cc305947cb48edc4e4ea, a148a4e86d5ffa54c9298d4c243084e180b22535, b5c4660461765e2828d59476bdfe55f95147673c, 44a1c81ff731894b44361c9e7ae633ed1bd7a208, 15fe1fb0854c8b267aecd03714c2ff42e6668e8f, fc3096e06ef2b25f2aedbf2cd09150a8273c9f55, c5d68eea53e132e42b51fbd0fee44442c7c64cbd, 84b2b6a7aabf447e6a24c8b551cc22450f90efd8, 7d114d36e930b1ebe9f6e204d7834065460882fe, 238c76460c4abfe536ae2645957230f6fe18e30a, 513ae8b23c8a541deb3c213db8a3bf83b9568a39, 3e122e80b9cd93fa52c6dede9dd56dd54366ea90, dafdad7974193bb2f6a909546b294b068265c7bc, 7f3525112ca8388a53772e37129ac3eacf4dcd9a, b92cc0855c1993c17c58cd7928277393b5c72cac, 536a2a7defc49a81e598b228848258ff5201e94f, 137921c319020018384fcd8c7a87229b8a7236ad, b1d8ddc6cff74a46ab74b78c581bbf58949e1e7c, 0e02a47e9717f7449aaadeeb1e72fef8342a9010, 1a899341a7e1d2fdb4d48701209e0b197bb89799, 8f8bb73943ce328f172f402b0683b45412073c2f, 7107911ff15fea32aef80a53e97cb10923ed0f20, df9e0e4a911da127315179cfa42bf892cd53137e, 5d5a56b2776f2da0323add3f92c5740917f0204c, 46d61949fd7be630c6d7d08962157438e217ed49, 74363d822f18663f19a7e586d91e66a8af66b897, ee9af4a8565ca1663e01e2d9c1aa9b9448dcac51, cfffb4f8cd4f40023255624e6a4ce56dcfcb0d74, 4a88a15e35a67e529d71db42907c35d3ebd039ed, 879f9f1ceaf0f76c5a193ce4423dfd7b334f7b1d, f00fceddac8e18a52dfc1d31a9758c2d66df4d37, d5493673dbdd840ae09238922e803ba1272e3354, ae7d3808107502a48b4cd6e42ad01f304ec80728, e63b2cfd377688c3f0a9054636fb52d973998a7a, 90b4625c48bea7c152626db0afd384a2ce44e7b3)

CLA Not Signed

@christophe-kamphaus-jemmic - please sign the CLA if you haven't already.

adrielp avatar Sep 05 '24 01:09 adrielp

@lmolkova @christophe-kamphaus-jemmic @adrielp - Regarding events vs. metrics a few thoughts:

Overall: I think they can be complementary. We can define both, we should ensure metrics can be derived from events.

  • How do we get "initial state" if we only capture lifecycle events? We either need to define a snapshot, or we can use these metrics as that initial state.
  • I'm not as familiar with GH APIs now, but how does event capture work? It could be implementation wise, it's cheaper to report metrics and poll for chagnes/update metrics in a single stateful system. Haven't looked at the implementation in the collector.

Overall - I think Liudmilla is right about scale and flexiblity if we can fire events. I don't see this as a blocker, but I'd like to understand possibilities better.

jsuereth avatar Sep 16 '24 13:09 jsuereth

SemConv meeting notes:

Review statistics provided by Github (Pulse / Insights). Is there overlap with the metrics defined here?

What is the minimum set of recommended metrics / attributes that gives good observability for vcs?

Review statistics provided by Github (Pulse / Insights). Is there overlap with the metrics defined here?

Metrics provided by Github (screenshots)

image image

image

image

image

image

image

Correspondance between Github statistics and Otel SemConv VCS metric defined in this PR.

Github statistic Otel SemConv VCS metric Notes
Number of forks
Number of stars
Number of open issues
Number of open PRs vcs.repository.change.count
Commit activity over time on main branch
Number of closed issues
Number of merged PRs vcs.repository.change.count
Number of unresolved conversations
# of commits by contributor to main branch
Commit activity over time by contributor on main branch
Lines added by contributor to main branch
Lines removed by contributor from main branch
Lines added over time on main branch
Lines removed over time on main branch
Commits ahead for branch vs main vcs.repository.ref.revisions_ahead
Commits behind for branch vs main vcs.repository.ref.revisions_behind

Should we adapt the defined VCS metrics to cover some of these Github statistics and/or add additional metrics?

I'm a bit concerned about the inclusion of metrics using PII, which would allow organizations to track performance of individuals. While organizations can easily implement such tracking tools, (GH itself offers them to some extent) i believe we shouldn't make it easy (or encourage that behavior) in this context.

What if we remove # of commits by contributor, Commit activity over time by contributor, Lines added by contributor and Lines removed by contributor in this stage? I believe this is a very sensitive topic and we should have a dedicated and focused conversation around this if we really want to include them.

Elfo404 avatar Sep 19 '24 13:09 Elfo404

Re:

Review statistics provided by Github (Pulse / Insights). Is there overlap with the metrics defined here? ....

I'm not a huge fan of GitHub's insights, and most folks I've talked to aren't either. They don't get much, if any, value out of those insights as they don't give enough information to ask questions and analyse. Additionally, I have concerns about including PII (contributor emails) in the metrics. Time after time I've seen companies misuse this type of data to incorrectly pass judgement against engineers. I feel so strongly about this that I wrote an observing human systems RFC 2119 specification.

My take is that we should approach these metrics (and I think we are already) from the perspective of what can come from a VCS, not from what comes from GitHub only. I think the metrics we have closely align with dora.dev's capabilities. Additionally, I think these metrics could be considered as foundational to aggregation style metrics in other series CHAOSS.

adrielp avatar Sep 19 '24 13:09 adrielp

I have tried to summarize the discussion we had last Monday in the Semantic Convention SIG as best I can here below:

Summary of Semantic Convention SIG meeting 2024-10-14

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md Histograms aggregate multiple events as they occur. For each reporting time they return the aggregated distribution.

In this PR we do not do that: If we reported the metrics in question as histogram we would sample all data points (ie. PR counts and duration per change state) at each reporting time and the counts could decrease (ie if a PR moves from one state to another). We would sample the entire distribution for each reporting time. Cumulative counts would be incorrect since we would consider all PRs as new events for each reporting.

Gauges best correspond to the data model of the metrics. It accurately represents the data (with no aggregation). A drawback is that the number of gauges scales with the number of PRs instead of having a single histogram aggregating that data. Ie we have N gauges instead of 1 gauge histogram. There might also be an usability issue to transform multiple gauges into a histogram (eg in Prometheus / Grafana). For histograms Prometheus has several functions available (eg. percentiles, 99th percentile, 50th percentile, averages, counts, sums). We can do heatmaps for histograms. For N gauges we don't get the usability of having a histogram.

Let's consider usability (eg existing dashboard, recordings of demo) @adrielp.

Conclusion by @jsuereth : stick with gauges for now and sort out the gauge histogram data model later -> create SemConv CICD issue, link to existing gauge histogram issues in spec. @lmolkova : Add comment somewhere in the Markdown or Yaml, saying that instrument type might change, and this will be a prerequisite to stability, but for now it's good.

Dashboard by @adrielp using gauge data: https://grafana.com/grafana/dashboards/20976-engineering-effectiveness-metrics/

I have tried to summarize the discussion we had last Monday in the Semantic Convention SIG as best I can here below:

Summary of Semantic Convention SIG meeting 2024-10-14

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md Histograms aggregate multiple events as they occur. For each reporting time they return the aggregated distribution.

In this PR we do not do that: If we reported the metrics in question as histogram we would sample all data points (ie. PR counts and duration per change state) at each reporting time and the counts could decrease (ie if a PR moves from one state to another). We would sample the entire distribution for each reporting time. Cumulative counts would be incorrect since we would consider all PRs as new events for each reporting.

Gauges best correspond to the data model of the metrics. It accurately represents the data (with no aggregation). A drawback is that the number of gauges scales with the number of PRs instead of having a single histogram aggregating that data. Ie we have N gauges instead of 1 gauge histogram. There might also be an usability issue to transform multiple gauges into a histogram (eg in Prometheus / Grafana). For histograms Prometheus has several functions available (eg. percentiles, 99th percentile, 50th percentile, averages, counts, sums). We can do heatmaps for histograms. For N gauges we don't get the usability of having a histogram.

Let's consider usability (eg existing dashboard, recordings of demo) @adrielp.

Conclusion by @jsuereth : stick with gauges for now and sort out the gauge histogram data model later -> create SemConv CICD issue, link to existing gauge histogram issues in spec. @lmolkova : Add comment somewhere in the Markdown or Yaml, saying that instrument type might change, and this will be a prerequisite to stability, but for now it's good.

I've wrote a quick document that discusses the historical context of these metrics, how to query them on the frontend, the dashboard, and how to make the histograms from the aggregations. I'll open the document for comments & feedback. Right now it's a google doc because that was the easiest place for me to collect my thoughts & I wasn't feeling opening yet another issue 🤣

Here's the google doc.

There's also an older zoom recording (starts at 5:35 of me demoing these metrics (and DORA) in the OTel Demo SIG meeting. Lot's of iterations have been had since then, so you'll certainly see differences in the dashboards. But hopefully that'll provide additional information.

adrielp avatar Oct 25 '24 16:10 adrielp

This looks awesome! 🚀

lmolkova avatar Nov 08 '24 17:11 lmolkova