test-infra icon indicating copy to clipboard operation
test-infra copied to clipboard

don't re-triage all issues annually

Open BenTheElder opened this issue 1 year ago • 6 comments
trafficstars

We were discussing in the api-machinery triage meeting that there are some issues fitting a pattern like:

help-wanted, lifecycle/frozen, triage/accepted

Which are:

  • valid known issues
  • lacking sufficient staffing to address
  • contain context on why the issue has not been resolved

re-triaging them annually doesn't really add information, if anything it buries the actual discussion under bot comments + /triage accepted comments, closing them buries the context on why they haven't been resolved.

I think we should only re-triage issues that are not frozen, or issues that are not frozen and marked help-wanted. (IMHO any frozen issue, but the former would still help)

Issues that are frozen + help-wanted are a searchable way to say "yes we know about this, but someone will have to step up and solve it, here is the context".

For example: https://github.com/kubernetes/kubernetes/issues/104607 This issue is complicated to fix, but in the meantime it remains confusing and should be documented.

Currently the bot will annually remove triage/accepted from all issues, even ones with this label set that are effectively "this is known and we need help".

I strongly believe closing https://github.com/kubernetes/kubernetes/issues/104607 is unhelpful and further buries this confusing behavior, but I also understand that none of us currently have the time to resolve it. So for now we're just re-triaged it again.

/sig contributor-experience

BenTheElder avatar Jul 11 '24 20:07 BenTheElder

I agree. I love the re-triage effort, but if someone has taken the (rather extraordinary) step of freezing an issue, it's probably not the sort of thing that we need to re-triage. I think we have a signal-to-noise problem in a few places, this is one.

thockin avatar Jul 12 '24 16:07 thockin

+100. We get a batch of these, time warped in from the past, each week. The frozen ones just create noise.

jpbetz avatar Jul 14 '24 21:07 jpbetz

The idea of retriaging was to make sure the issue is not obsolete, and reprioritize as needed. If it's creating too much noise as is, maybe we should consider increasing the interval rather than removing it entirely?

tallclair avatar Jul 17 '24 17:07 tallclair

The idea of retriaging was to make sure the issue is not obsolete, and reprioritize as needed. If it's creating too much noise as is, maybe we should consider increasing the interval rather than removing it entirely?

But frozen issues are intentionally opted into long term tracking to prevent auto-close, if people want to opt-in to identify if old issues are still valid that's great, but automatically re-triaging frozen issues seems excessive.

We shouldn't be closing frozen issues without high confidence and these comments are always noise that buries real non-automated discussion, unless they result in a valid closure. GitHub does not and never has supported issues with a large volume of comments well, the cost of annually adding at least two comments with no additional information beyond "yes we didn't decide to close it again" seems pointless.

If we want to re-validate issues we can look through older frozen issues without robot comments, anyone can just query for label:lifecycle/frozen label:triage/accepted and start from the last page of results without adding noise to the issues.

EDIT: Leaving them open has very little cost and almost no downsides. We're not really running into anywhere that the number of open issues is a constraint ..?

BenTheElder avatar Jul 17 '24 19:07 BenTheElder

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 15 '24 19:10 k8s-triage-robot

/remove-lifecycle stale

cc @kubernetes/sig-contributor-experience (previously raised in slack)

BenTheElder avatar Oct 15 '24 22:10 BenTheElder

Hi @kubernetes/sig-contributor-experience, what would it take to discuss potentially acting on this recommendation? The configuration changes proposed are small and I think will incrementally improve signal-to-noise on the triage robot and avoid wasting time.

BenTheElder avatar Nov 18 '24 07:11 BenTheElder

Per @palnabarun: cc @kubernetes/sig-contributor-experience-leads 😅

xref: https://github.com/kubernetes/community/issues/8047

BenTheElder avatar Nov 21 '24 16:11 BenTheElder

+1 from me.

jberkus avatar Nov 22 '24 19:11 jberkus

gentle nudge @kubernetes/sig-contributor-experience-leads

I've been participating in SIG API Machinery triage for a while now and aside from https://github.com/kubernetes/test-infra/pull/34255 I've noticed a lot of difficult to resolve issues pointlessly getting a yearly robot comment burying the conversation. These issues are not something we should just close though because they do impact users and continue to be rediscovered and when we just close them we lose the previous efforts to root cause them

As a good example: https://github.com/kubernetes/kubernetes/issues/78946 is not trivial to resolve but it is a subtle bug that users frequently encounter and have to workaround. Keeping this sort of issue open and frozen without re-triage would save us time to work on the issues we can get to. (There are others linked above)

BenTheElder avatar Feb 10 '25 20:02 BenTheElder

+1 From @kubernetes/sig-contributor-experience-leads

As this is a change that affects the whole project, we would like to send out a notification on dev@ to inform / seek a lazy consesus for a week on this topic

mfahlandt avatar Feb 11 '25 00:02 mfahlandt

(apologies for the extremely late response. It just occurred to me, so I thought I’d check / ask for feedback)

since all the examples we've talked about so far are from the k/k repo, would it work if we diasble the current ci-k8s-triage-robot-* jobs[1] for the kubernetes/kubernetes repo and create separate retriage rules just for it, as proposed above?

This way, we won’t change the "retriage cycles" for other kubernetes-* orgs and repos, and keep the cadence the community last agreed upon for retriage cycle [2].

[1] https://github.com/kubernetes/test-infra/blob/e180e9fe4a2fbfec38f4be10718b773a74405188/config/jobs/kubernetes/sig-k8s-infra/trusted/sig-contribex-k8s-triage-robot.yaml#L605-L752 [2] https://groups.google.com/a/kubernetes.io/g/dev/c/GjAn5qLwA64/m/t3JmfGu3AgAJ

Priyankasaggu11929 avatar Feb 11 '25 13:02 Priyankasaggu11929

(apologies for the extremely late response. It just occurred to me, so I thought I’d check / ask for feedback)

since all the examples we've talked about so far are from the k/k repo, would it work if we diasble the current ci-k8s-triage-robot-* jobs[1] for the kubernetes/kubernetes repo and create separate retriage rules just for it, as proposed above?

This way, we won’t change the "retriage cycles" for other kubernetes-* orgs and repos, and keep the cadence the community last agreed upon for retriage cycle [2].

I think not removing triage/accepted from lifecycle/frozen and help wanted makes sense for all repos, I'd like to draft the PR, send this to [email protected], and then if there are objections we can iterate. I think that makes more sense than preemptively narrowing it and increasing the complexity.

Currently the only deviation we have between orgs/repos is that some repos are opted out of the robot. For repos that haven't disabled the robot, the behavior is pretty uniform. Unless we hear specific objections/concerns I think we should keep it that way.

I haven't seen a counterpoint anywhere yet where it's genuinely helpful to re-triage an issue that is either requesting help or explicitly frozen out of the lifecycle, but maybe we'll know differently when we forward the proposal to dev@.

BenTheElder avatar Feb 11 '25 18:02 BenTheElder

I think not removing triage/accepted from lifecycle/frozen and help wanted makes sense for all repos, I'd like to draft the PR, send this to [email protected], and then if there are objections we can iterate. I think that makes more sense than preemptively narrowing it and increasing the complexity.

@BenTheElder - ack and thanks for the draft PR, and for sending the notification over to dev@... .

Currently the only deviation we have between orgs/repos is that some repos are opted out of the robot. For repos that haven't disabled the robot, the behavior is pretty uniform. Unless we hear specific objections/concerns I think we should keep it that way.

I haven't seen a counterpoint anywhere yet where it's genuinely helpful to re-triage an issue that is either requesting help or explicitly frozen out of the lifecycle, but maybe we'll know differently when we forward the proposal to dev@.

Ack, and agree with seeking objections/concerns on the mailing list notification.

I've gone through the various discussion threads we have on this topic across k/k, k/test-infra, and I’m strongly +1 on a change that would help improve the k/k issue triaging situation.

Priyankasaggu11929 avatar Feb 12 '25 08:02 Priyankasaggu11929

If it helps, as a Cluster API / controller-runtime maintainer I'm also in favor of this change. For me the current configuration just leads to a lot of unnecessary toil across a lot of repos.

sbueringer avatar Feb 12 '25 10:02 sbueringer

PR draft is held at https://github.com/kubernetes/test-infra/pull/34321

Working on [email protected] request for feedback now.

BenTheElder avatar Feb 12 '25 19:02 BenTheElder

dev@ notified here: https://groups.google.com/a/kubernetes.io/g/dev/c/lbBYa4jA6xk

BenTheElder avatar Feb 12 '25 19:02 BenTheElder

Implemented in https://github.com/kubernetes/test-infra/pull/34321

BenTheElder avatar Feb 27 '25 00:02 BenTheElder

We were discussing in the api-machinery triage meeting that there are some issues fitting a pattern like:

help-wanted, lifecycle/frozen, triage/accepted

Which are:

* valid known issues

* lacking sufficient staffing to address

* contain context on why the issue has not been resolved

re-triaging them annually doesn't really add information, if anything it buries the actual discussion under bot comments + /triage accepted comments, closing them buries the context on why they haven't been resolved.

I think we should only re-triage issues that are not frozen, or issues that are not frozen and marked help-wanted. (IMHO any frozen issue, but the former would still help)

I am -0 on this; I'd do the opposite, kind of.

If it's frozen and not help-wanted, and triaged accepted, leave it accepted. If it's help-wanted we should do something to ensure that either:

  • we remove the help-wanted marker
  • we have validated that the issue still meets the criteria for help-wanted

For example, if the issue description points to code that has since moved to a new file, we shouldn't leave it as a help-wanted issue. We can leave it frozen and accepted though.

Maybe we should add another label to indicate that the issue ought to get reviewed. Not re-triaged, but "needs-description-review" or something. I can, uh, file an issue about adding that - if folks like the idea.

sftim avatar Mar 06 '25 13:03 sftim

For example, if the issue description points to code that has since moved to a new file, we shouldn't leave it as a help-wanted issue. We can leave it frozen and accepted though.

Um, that sort of detail is NOT a requirement for help-wanted. I don't think this is a reasonable thing to require. People taking up the issue can reach out for help and we can update it then.

This sort of "the code moved" can happen the same day the issue is filed, there's no guarantee even without auto-commenting on the issues that we've updated code links, developers contributing will have to follow the changes in git / PRs / ...

Frozen+help-wanted implies nobody is taking it up, but it's a valid bug to track. The whole point of that state is that we don't have the bandwidth to fix it, but we acknowledge that the issue isn't going anywhere.

Also, note that removing the label doesn't necessarily cause anyone to do anything, but spamming the issues does make it more difficult for anyone trying to participate to find the real conversation (also github does not work well when threads get long).

Maybe we should add another label to indicate that the issue ought to get reviewed. Not re-triaged, but "needs-description-review" or something.

If someone wants to optionally go dig through these old issues to re-check them, the label for that in this case is lifecycle/frozen?

Auto-commenting periodically adds a lot of irrelevant comments burying the real conversation and costs us limited github API quota.

BenTheElder avatar Mar 06 '25 20:03 BenTheElder

I think there are a lot of use cases that can easily be covered with triage-party without producing a lot of toil and noise for everyone

sbueringer avatar Mar 07 '25 05:03 sbueringer

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For me, a misleading or winding-trail-of-details issue description is too high a barrier to entry and as maintainers we should either:

  • manage this well
  • redefine help-wanted

Let's not auto comment though.

sftim avatar Mar 07 '25 09:03 sftim