velocity
velocity copied to clipboard
"Number of pull requests" data appears inaccurate/misleading
In the velocity reports, we report "The y-axis is the total number of pull requests and issues".
From the query, this is determined by the total amount of PullRequestEvents: https://github.com/cncf/velocity/blob/8e1d1c189b65e2544fae7aec43c6381f9e4b4d82/BigQuery/velocity_cncf.sql#L19C18-L19C34.
A PullRequestEvent does not correlate 1:1 with "a PR" in a way a person would interpret a count of PRs, in my opinion. There are two reasonable approaches (merged PRs or opened PRs, strongly preferring merged PRs), neither of which this counts.
Per docs 'The action that was performed. Can be one of opened, edited, closed, reopened, assigned, unassigned, review_requested, review_request_removed, labeled, unlabeled, and synchronize.'. However, in practice I found this doesn't seem to be the case. Looking at a single day across github:
1916 reopened
168129 closed
193220 opened
Even without the other possible events, we at least appear to be double counting PRs?
Hi, I can take a look on October 13th the earliest, I'll be on KubeCon next week.
Actually I've checked charts and we are considering PRs/Issues activities there not just opened PRs, and this is consistent even if we take data from non-github projects (we then count activitie son bugs/emials etc), now cc @caniszczyk what to do:
- Keep it as-is (it was decided that way, years back) but get through all docs and generated charts and add a specific information that we count PR/Issue activities.
- Update code to actually count PRs/Issues (unique PR/Issue IDs on all activities) everywhere, so from now on we will have different stats that previous reports.
This needs decision: either (1) or (2).
This still needs decision, so I'll keep this open.
Hi @lukaszgryglicki, thanks for looking into this and propose potential options. My vote is 2 as it is what number of PRs really means :)
OK but this needs an approval and new reports will now look differentb than they were, let's wait for a decision.
/cc @caniszczyk who I assume is the decider
I have to admit the numbers on the 2024 report for OTel confused me a little bit, because they didn't match was I could see on https://opentelemetry.devstats.cncf.io/. I think If it was rephrased to pull request + issue "activity" it would be clearer, without having to change the actual numbers (as mentioned above that would alter reports).