semantic-conventions
semantic-conventions copied to clipboard
Add VCS metrics from Github receiver
Fixes #1372
Changes
This PR adds the VCS metrics from the GitHub Receiver to Semantic Conventions. These metrics are based on https://github.com/adrielp/opentelemetry-collector-contrib/blob/903eec5382352e4d663fadd2dc7f1cc1e37f62fd/receiver/githubreceiver/documentation.md
Merge requirement checklist
- [x] CONTRIBUTING.md guidelines followed.
- [x] Sign CLA
- [x] Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with
[chore]
- If your PR does not need a change log, start the PR title with
- [x] schema-next.yaml updated with changes to existing conventions.
TODO
- [x] Update
docs/cicd/cicd-metrics.mdafter receiving changes from https://github.com/open-telemetry/semantic-conventions/pull/1411
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: christophe-kamphaus-jemmic / name: Christophe Kamphaus (a5eff73d664e95959f1574b021e37bbed98bfcd1, 163e4989ad094234c55a266fb9c4e736e9f34a4b, 41543cdc9d4d76f9327d8689acfacc86c70c637a, 4d8e3a9c75fa66e4a22952c98c9087fed6b2592a, ef91dfa616bd776ac951df1827429b027a82735b, 30c8fdb6f0d5e2778f14c6fc6527df86b1722b63, e174ef3572be947aa46d638168c8c287e6e32536, cfcdaf1ac3a1887af58cdcadb40863d660b64a82, 347b15b0db0a0451f0ae696d949b7d4537819e87, 700412af4b8293559d312afd44ee939b32d4374b, 5a9ffe7ef997110472448ddf519289cd0a34a85f, c232041cf70354695878b9f6bab8a83000934e98, aab9f90b5944da890db2c82a00871a0d1430bf44, f71231217f6607fb0d7a1a212b37b578ce922c9d, d2ce62f554bb4f6eef08014de2ad10455d568462, 2da96a53e779912c7371c76184d18d4a268a3a37, 8c2e8a757b7203e04c3fe63bc1a705bafb0f19ef, afaa1f5bffd0d548e50ddb664eff5175723d2879, 592483f1530ad57334cf9057db1734349482834b, 9bd4a958315ba58893b88efd597e1d1fb2dac087, 344eeedbeda8a77ef258cc305947cb48edc4e4ea, a148a4e86d5ffa54c9298d4c243084e180b22535, b5c4660461765e2828d59476bdfe55f95147673c, 44a1c81ff731894b44361c9e7ae633ed1bd7a208, 15fe1fb0854c8b267aecd03714c2ff42e6668e8f, fc3096e06ef2b25f2aedbf2cd09150a8273c9f55, c5d68eea53e132e42b51fbd0fee44442c7c64cbd, 84b2b6a7aabf447e6a24c8b551cc22450f90efd8, 7d114d36e930b1ebe9f6e204d7834065460882fe, 238c76460c4abfe536ae2645957230f6fe18e30a, 513ae8b23c8a541deb3c213db8a3bf83b9568a39, 3e122e80b9cd93fa52c6dede9dd56dd54366ea90, dafdad7974193bb2f6a909546b294b068265c7bc, 7f3525112ca8388a53772e37129ac3eacf4dcd9a, b92cc0855c1993c17c58cd7928277393b5c72cac, 536a2a7defc49a81e598b228848258ff5201e94f, 137921c319020018384fcd8c7a87229b8a7236ad, b1d8ddc6cff74a46ab74b78c581bbf58949e1e7c, 0e02a47e9717f7449aaadeeb1e72fef8342a9010, 1a899341a7e1d2fdb4d48701209e0b197bb89799, 8f8bb73943ce328f172f402b0683b45412073c2f, 7107911ff15fea32aef80a53e97cb10923ed0f20, df9e0e4a911da127315179cfa42bf892cd53137e, 5d5a56b2776f2da0323add3f92c5740917f0204c, 46d61949fd7be630c6d7d08962157438e217ed49, 74363d822f18663f19a7e586d91e66a8af66b897, ee9af4a8565ca1663e01e2d9c1aa9b9448dcac51, cfffb4f8cd4f40023255624e6a4ce56dcfcb0d74, 4a88a15e35a67e529d71db42907c35d3ebd039ed, 879f9f1ceaf0f76c5a193ce4423dfd7b334f7b1d, f00fceddac8e18a52dfc1d31a9758c2d66df4d37, d5493673dbdd840ae09238922e803ba1272e3354, ae7d3808107502a48b4cd6e42ad01f304ec80728, e63b2cfd377688c3f0a9054636fb52d973998a7a, 90b4625c48bea7c152626db0afd384a2ce44e7b3)
- ❌ - login: @christophe-kamphaus-jemmic / name: Christophe Kamphaus . The commit (44a1c81, 4d8e3a9, 592483f, ef91dfa, 238c764, 9bd4a95, 8c2e8a7, f712312, 700412a) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.
@christophe-kamphaus-jemmic - please sign the CLA if you haven't already.
@lmolkova @christophe-kamphaus-jemmic @adrielp - Regarding events vs. metrics a few thoughts:
Overall: I think they can be complementary. We can define both, we should ensure metrics can be derived from events.
- How do we get "initial state" if we only capture lifecycle events? We either need to define a snapshot, or we can use these metrics as that initial state.
- I'm not as familiar with GH APIs now, but how does event capture work? It could be implementation wise, it's cheaper to report metrics and poll for chagnes/update metrics in a single stateful system. Haven't looked at the implementation in the collector.
Overall - I think Liudmilla is right about scale and flexiblity if we can fire events. I don't see this as a blocker, but I'd like to understand possibilities better.
SemConv meeting notes:
Review statistics provided by Github (Pulse / Insights). Is there overlap with the metrics defined here?
What is the minimum set of recommended metrics / attributes that gives good observability for vcs?
Review statistics provided by Github (Pulse / Insights). Is there overlap with the metrics defined here?
Metrics provided by Github (screenshots)
Correspondance between Github statistics and Otel SemConv VCS metric defined in this PR.
| Github statistic | Otel SemConv VCS metric | Notes |
|---|---|---|
| Number of forks | ||
| Number of stars | ||
| Number of open issues | ||
| Number of open PRs | vcs.repository.change.count |
|
| Commit activity over time | on main branch | |
| Number of closed issues | ||
| Number of merged PRs | vcs.repository.change.count |
|
| Number of unresolved conversations | ||
| # of commits by contributor | to main branch | |
| Commit activity over time by contributor | on main branch | |
| Lines added by contributor | to main branch | |
| Lines removed by contributor | from main branch | |
| Lines added over time | on main branch | |
| Lines removed over time | on main branch | |
| Commits ahead for branch vs main | vcs.repository.ref.revisions_ahead |
|
| Commits behind for branch vs main | vcs.repository.ref.revisions_behind |
Should we adapt the defined VCS metrics to cover some of these Github statistics and/or add additional metrics?
I'm a bit concerned about the inclusion of metrics using PII, which would allow organizations to track performance of individuals. While organizations can easily implement such tracking tools, (GH itself offers them to some extent) i believe we shouldn't make it easy (or encourage that behavior) in this context.
What if we remove # of commits by contributor, Commit activity over time by contributor, Lines added by contributor and Lines removed by contributor in this stage? I believe this is a very sensitive topic and we should have a dedicated and focused conversation around this if we really want to include them.
Re:
Review statistics provided by Github (Pulse / Insights). Is there overlap with the metrics defined here? ....
I'm not a huge fan of GitHub's insights, and most folks I've talked to aren't either. They don't get much, if any, value out of those insights as they don't give enough information to ask questions and analyse. Additionally, I have concerns about including PII (contributor emails) in the metrics. Time after time I've seen companies misuse this type of data to incorrectly pass judgement against engineers. I feel so strongly about this that I wrote an observing human systems RFC 2119 specification.
My take is that we should approach these metrics (and I think we are already) from the perspective of what can come from a VCS, not from what comes from GitHub only. I think the metrics we have closely align with dora.dev's capabilities. Additionally, I think these metrics could be considered as foundational to aggregation style metrics in other series CHAOSS.
I have tried to summarize the discussion we had last Monday in the Semantic Convention SIG as best I can here below:
Summary of Semantic Convention SIG meeting 2024-10-14
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md Histograms aggregate multiple events as they occur. For each reporting time they return the aggregated distribution.
In this PR we do not do that: If we reported the metrics in question as histogram we would sample all data points (ie. PR counts and duration per change state) at each reporting time and the counts could decrease (ie if a PR moves from one state to another). We would sample the entire distribution for each reporting time. Cumulative counts would be incorrect since we would consider all PRs as new events for each reporting.
Gauges best correspond to the data model of the metrics. It accurately represents the data (with no aggregation). A drawback is that the number of gauges scales with the number of PRs instead of having a single histogram aggregating that data. Ie we have N gauges instead of 1 gauge histogram. There might also be an usability issue to transform multiple gauges into a histogram (eg in Prometheus / Grafana). For histograms Prometheus has several functions available (eg. percentiles, 99th percentile, 50th percentile, averages, counts, sums). We can do heatmaps for histograms. For N gauges we don't get the usability of having a histogram.
Let's consider usability (eg existing dashboard, recordings of demo) @adrielp.
Conclusion by @jsuereth : stick with gauges for now and sort out the gauge histogram data model later -> create SemConv CICD issue, link to existing gauge histogram issues in spec. @lmolkova : Add comment somewhere in the Markdown or Yaml, saying that instrument type might change, and this will be a prerequisite to stability, but for now it's good.
Dashboard by @adrielp using gauge data: https://grafana.com/grafana/dashboards/20976-engineering-effectiveness-metrics/
I have tried to summarize the discussion we had last Monday in the Semantic Convention SIG as best I can here below:
Summary of Semantic Convention SIG meeting 2024-10-14
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md Histograms aggregate multiple events as they occur. For each reporting time they return the aggregated distribution.
In this PR we do not do that: If we reported the metrics in question as histogram we would sample all data points (ie. PR counts and duration per change state) at each reporting time and the counts could decrease (ie if a PR moves from one state to another). We would sample the entire distribution for each reporting time. Cumulative counts would be incorrect since we would consider all PRs as new events for each reporting.
Gauges best correspond to the data model of the metrics. It accurately represents the data (with no aggregation). A drawback is that the number of gauges scales with the number of PRs instead of having a single histogram aggregating that data. Ie we have N gauges instead of 1 gauge histogram. There might also be an usability issue to transform multiple gauges into a histogram (eg in Prometheus / Grafana). For histograms Prometheus has several functions available (eg. percentiles, 99th percentile, 50th percentile, averages, counts, sums). We can do heatmaps for histograms. For N gauges we don't get the usability of having a histogram.
Let's consider usability (eg existing dashboard, recordings of demo) @adrielp.
Conclusion by @jsuereth : stick with gauges for now and sort out the gauge histogram data model later -> create SemConv CICD issue, link to existing gauge histogram issues in spec. @lmolkova : Add comment somewhere in the Markdown or Yaml, saying that instrument type might change, and this will be a prerequisite to stability, but for now it's good.
I've wrote a quick document that discusses the historical context of these metrics, how to query them on the frontend, the dashboard, and how to make the histograms from the aggregations. I'll open the document for comments & feedback. Right now it's a google doc because that was the easiest place for me to collect my thoughts & I wasn't feeling opening yet another issue 🤣
Here's the google doc.
There's also an older zoom recording (starts at 5:35 of me demoing these metrics (and DORA) in the OTel Demo SIG meeting. Lot's of iterations have been had since then, so you'll certainly see differences in the dashboards. But hopefully that'll provide additional information.
This looks awesome! 🚀