core-workflow icon indicating copy to clipboard operation
core-workflow copied to clipboard

Improve finding issues without a pull request to work on

Open nilleb opened this issue 4 months ago • 16 comments

As a newbie to the CPython repository, it can be quite frustrating to find issues that have not an associated pull request.

The linked:pr has a small coverage and requires manual intervention (ie. a "closes" or "fixes" magic keyword in the description). Moreover, the linked:pr will close the associated issue on merge, and sometimes the issue should be kept open (because of retrocompatibility PRs, and others that I can not yet grasp).

The GitHub API exposes another concept (cross references) and these appear in the history of an issue. This is why I first implemented a script to list all the issues and the associated PRs. The output looks like this. But this is not a scalable solution: the GitHub API could suffer if all the newbies were to use it, and the script needs to be executed often to keep the picture up to date with reality.

Given that the cross-references are essentially events, why don't we add a github workflow that

  • adds a has-pr label when a PR is created, referencing an issue in its title or in its description
  • adds a has-pr label when a comment is added to the PR mentioning an issue
  • removes the has-pr label when the PR is closed without being merged

I have a sample GitHub Actions workflow available on the nilleb/cpython-stub repository

A discussion has been opened on Discourse.

nilleb avatar Sep 05 '25 09:09 nilleb

This issue should be labeled with infra (not sure about type: feature, this is why I didn't select that template - sorry).

nilleb avatar Sep 05 '25 09:09 nilleb

(Let's move this to https://github.com/python/core-workflow)

hugovk avatar Sep 05 '25 09:09 hugovk

Linking the branch containing the workflow, in case.

nilleb avatar Sep 05 '25 10:09 nilleb

See also #540

nineteendo avatar Sep 05 '25 10:09 nineteendo

Thanks a lot for sharing the link to #540!!!

Apart from the fact that I don't agree with https://github.com/python/core-workflow/issues/540#issuecomment-2143911590 because this does not match the reality

The proposed workflow matches what ezio-melotti described one year ago, and it automatically removes the label from the issue when the PR is Closed as proposed by terryjreedy.

Can we give it a try? 😁

nilleb avatar Sep 05 '25 10:09 nilleb

The workflow is quite complicated, maybe it would be better to implement this in bedevere instead?

StanFromIreland avatar Sep 05 '25 14:09 StanFromIreland

Yeah, if we do it, Bedevere is the way. First let's evaluate the if :)

hugovk avatar Sep 05 '25 14:09 hugovk

If we don't implement this in CPython/Bedevere, you can write a simple script to iterate over issues and check descriptions for a linked PRs section.

StanFromIreland avatar Sep 05 '25 15:09 StanFromIreland

The script is here: https://gist.github.com/nilleb/112f2dc55c14c9ed3b0809a1801d48fb, I implemented it before the workflow. I mentioned it in the description, without linking it, because it will stress the GitHub API (at scale) way more than the proposed workflow.

nilleb avatar Sep 05 '25 15:09 nilleb

What is the process to discuss the if ?

I have the feeling that the discussion on #540 was stuck about one year ago. I would prefer not to let it just die.

If the workflow has to be implemented in python, OK, let's discuss. The primary concern I have about a python/bedevere implementation is only on the triggers of the three jobs. The rest is trivial (because it's just code).

Triggers:

nilleb avatar Sep 05 '25 15:09 nilleb

Ah, I missed that! Anyhow, you could create a dashboard for all? If many people were to use it, I’m sure GH would still handle it, they get many more requests every second.

StanFromIreland avatar Sep 05 '25 15:09 StanFromIreland

I am sorry, but I do not get what is wrong with the workflow or the bedevere update. A dashboard will never be up to date if it is disconnected from the events source (ie. the GitHub events related to pull requests and pull requests comments).

Here what is at stake is simplifying the access to contribution for any developer.

nilleb avatar Sep 05 '25 15:09 nilleb

Here what is at stake is simplifying the access to contribution for any developer.

I am unconvinced that we need to do this. Marking issues as already having PRs could be misleading for larger tracking issues, or for where an old PR has been abandonded or closed. Generally, there is not a 1:1 relationship between issues and PRs.

We try and mark good issues with e.g. the "easy" label. Finding that an issue already has a PR can be a great learning opportunity, or just a signal to look at a different issue.

A

AA-Turner avatar Sep 05 '25 15:09 AA-Turner

The proposed workflow removes the label for abandoned or closed pull requests; and a label is not the same as the "linked pr" concept.

nilleb avatar Sep 05 '25 15:09 nilleb

For the love of statistics, we have today 64 easy issues without an associated pull request. 27 of them are code (the other ones are docs). Among these, 1 is most recent than 5 years.

nilleb avatar Sep 05 '25 16:09 nilleb

My concern with a has-PR label is that it may discourage issue and PR triage, both of which are equally important as submitting PRs.

When I look at an issue in my area of expertise, I'm not wondering if it has a PR. If it doesn't, I may or may not end up writing a fix if I find the problem; however, if it does have a PR attached, I go to review it. That second case is an essential part of the development lifecycle, and is definitely vital for contributors aspiring to become core developers (because usually the first step is to join our triage team). There are also plenty of cases where an incorrect issue has an attached PR, and a has-PR label may implicitly signal that the issue is valid.

Basically, I'm saying that issues with PRs are just as important to look at. There's more to contribution than just authoring commits.


Stepping back a bit, I'd be more comfortable with having "stages" for issues, similar to how we handle PRs. It could go like this:

  1. The issue needs to be triaged (with an awaiting triage label, for example).
  2. The issue needs a PR (e.g., an awaiting PR label).
  3. The linked PR needs a review (awaiting PR review).
  4. PR(s) merged, issue should be closed (awaiting closure)
  5. Or optionally, some more work needs to be done (DO-NOT-CLOSE or something like that).

I'm thinking out loud, though.

ZeroIntensity avatar Sep 06 '25 15:09 ZeroIntensity