mbedtls
mbedtls copied to clipboard
Test PR [DO NOT MERGE]
Description
Ths is pr used to test github API and checklist fields. Do not review.
Proposal 1
PR checklist
Please tick as appropriate and edit the reasons (e.g.: "backport: not needed because this is a new feature")
- [ ] changelog provided, ~~not required~~, ~~covered~~ . Link --> #XYZ
- [ ] backport provided, ~~not required~~, ~~is backport~~ Link --> #XYZ
- [ ] tests provided, ~~not required~~, ~~covered~~ Link --> #XYZ
Proposal 2
PR checklist
Please tick as appropriate and edit the reasons (e.g.: "backport: not needed because this is a new feature")
Changelog
- [ ] Provided
- [ ] Not required
- [ ] Covered by
Backports
- [ ] Provided
- [ ] Not required
- [ ] Is backport of
Tests
- [ ] Provided
- [ ] Not required
- [ ] Covered by
Please refer to the contributing guidelines, especially the checklist for PR contributors.
Proposal 3 - Labels
One of changelog-provided
or changelog-not-required
. Both labels should be the same colour.
One of tests-provided
or tests-exempted
. Both labels should be the same colour.
One of backport-2.28-required
or backport-2.28-not-required
. Both labels should be the same colour. Link to backport will need to be handled separately.
One of backport-3.6-required
or backport-3.6-not-required
Both labels should be the same colour. Link to backport will need to be handled separately.
Justifications should go into the main description, not in comments (since comments get folded up). Use of labels allows simpler machine parsing for straightforward cases. Notes in the description can be used to cover more complicated cases (hopefully fairly rare)
One thing about "backports", especially in the presence of more than one LTS branch: I'm not sure it's the right concept. The checkboxes only make sense as they are on the main (development) PR, but are looking weird on PRs against 3.6 or 2.28. Also, some fixes are going to be 2.28-only or 3.6-only or "just 3.6 and 2.28 but not development" (for example we removed the buggy thing in 4.0). And some PRs are going to start as 2.28-only but while at it you fix a bug that happens to also apply to development, and now it's more about forward-porting than back-porting.
I'd like to suggest that instead we just have one bit by active branch that says "do we need it for that branch". Concretely:
With proposal 1
- [ ] changelog provided, ~~not required~~, ~~covered~~ . Link --> #XYZ
- [ ] dev PR provided, ~~not required~~ Link --> #XYZ
- [ ] 3.6 PR provided, ~~not required~~ Link --> #XYZ
- [ ] 2.28 PR** provided, ~~not required~~ Link --> #XYZ
- [ ] tests provided, ~~not required~~, ~~covered~~ Link --> #XYZ
(Then on the dev PR, "Link -> #XYZ" can link to the present PR on the "dev PR" line, etc. Alternatively, instead of "provided / not required", we have "provided / not require / this PR".)
Now if we have a bugfix that only applies to 3.6 and 2.28 then both PRs will have "3.6 provided" and "2.28 provided" but "dev not required".
With proposal 2
Instead of having a "backport" section we have 3 sections: dev / 3.6 / 2.28 each with same checkboxes. As with proposal 1, we either add a new line "this PR" or use the existing line "provided" and add a link to self. (Or don't add a link and make the script smart enough to deduce it from the PR's target branch.)
With proposal 3
Instead of having two pairs of "backports-x.yz-(not-)required" labels, we have 3 pairs: "{dev,3.6,2.28}-(not-)required".
One question we should also consider, is how well the various schemes work when we introduce a new LTS branch (or remove an old LTS branch), considering there is typically a dozens or PRs in flight at the time we make that change.
(I haven't given it much thought yet, but labels seem attractive here, as I think it's easier to mass-apply or mass-remove them than to mass-edit PR descriptions, especially as there will always be cases where people didn't respect the format.)
I'd like to suggest that instead we just have one bit by active branch that says "do we need it for that branch".
OpenSSL has a way of doing this with labels. A PR will list the branches it targets. If a PR can be cleanly merged into more than one branch then the same PR can target those branches. If there are conflicts, or only parts are needed on one branch, then separate PRs are required for each branch (and the label indicates which branch the PR targets)
To sum up my personal experiece.
We need to convey the following information in a way that is simple for humans to fill, and easy for scripts to parse.
- Has the entry been checked by the mainters before merging? -> checkbox
- Is that entry required, or not?
- If it is required or covered where is it? ( needed for linking purposes )
With that in mind I am more in favour in a variant of Proposal one.
**changelog**
provided
,~~not required~~
,~~covered~~
. Justification or comments.Link --> #XYZ
The contributor will only need to remove strikeout comments ~~
, fill his comments to the justification and if needed refer to the PR linking it.
This is also easier to scale/maintain since we just add new entries e.g for LTS 3.6 and the script parsing it will just use the same regex mathching to an extra line.
I am against labels for two reasons:
- Mostly, because as a gatekeeper, I want to know why a changelog/backport is not required, and what additional test steps might be required. Labels don't convey the why, so the information needs to be duplicated in the description. Duplicated information is unreliable unless it's checked mechanically. If the tooling can check the duplicated information, it can parse the information in the description, so having the labels does not simplify anything.
- Also because these labels would not be useful in searches, and we have enough label visual noise as it is.
Whatever the presentation is, I want it to encourage people to fill in the information. That means it needs to be reasonably short and straightforward. The more filler there is, the more people just don't read it at all. For this reason, I prefer proposal 1 to proposal 2. I agree with Manuel that we should have lines for 2.28/3.6/development though: sometimes people don't link the main PR from the backport!
Labels don't convey the why, so the information needs to be duplicated in the description.
Arguably, the description in this case would add information and not duplicate it. I agree that the consistency is easier to maintain if all this information is in the same place. It is not ideal, but would not be a blocker for me.
we have enough label visual noise as it is.
That was my main concern too, and I was thinking that maybe we could categorise the labels (project management, PR lifecycle, release script/gatekeeper) and assign a colour to each category. All labels in that category would use the same colour (optionally shade differences perhaps to give further visual cues). I would hope that could help the visual noise situation. What do you think?
We have been trying to get prose description working with little success over several iterations.
We have not tried labels for this once, let alone tried to improve their use after seeing how they work (or don’t).
If we have to carefully describe a process for how to use and format prose in the description because it’s not intuitively obviously, then it’s not working, since (a) external contributors won’t follow it; and (b) it will take up some of our cognitive capacity to track and use, which would be better spend on other things.
For changelog and tests, we want to know whether they are required and present; required but not present; or not required. It’s not clear with the check boxes whether they are to give the “required” status, or just that someone has verified the information. With labels (a) they only give the “required” status (and there’s no striking out or required form of words in a description to give this information and (b) the CI can check unequivocally that (i) one and only one of a required set of labels is needed and (ii) that there is a changelog entry if required, and changes in the PR under tests/
if required.
For backport information, yes, this is harder with labels, but the indication becomes obvious (backport-2.28
and/or backport-3.6
label(s). We could require backports to be listed in the description, and to have a title containing the main PR’s title together with the word “backport” to be able to find them by script. (This would be met by the usual [Backport 2.28] Main PR title
or Backport 3.6: Main PR title
forms that are already most commonly used)
And obviously, any corner cases can be addressed by comments in the PR, as they are already. It’s not the corner cases we are concerned with here, but the bulk of the hundreds of PRs in a release.
I'm suggesting labels because they are used in the OpenSSL project and it's been working for them. The description field has been used by Mbed TLS for a while now, and the fact that we're still having this discussion means that it's not working for us.
All labels in that category would use the same colour (optionally shade differences perhaps to give further visual cues)
Yes, using colour to group similar labels makes lots of sense. I think shade information would just create confusion, as long as a quick scan of the text of the label is sufficient to distinguish different labels of the same colour (i.e. having two labels in the same category that only differ by one letter but have different meaning would be a fail)
We have not tried labels for this once, let alone tried to improve their use after seeing how they work (or don’t).
We have tried labels, and they didn't work very well, and I don't see a proposal here that's a clear improvement.
For changelog entries, we used to have a needs-changelog
label. It was often out of date, so as a gatekeeper, the label was mostly useless: the gatekeeper and the release preparer would have to have to double check whether the label had been correctly omitted. That's why we decided to remove it, and just treat a missing changelog as a generic “changes requested” + needs-work
. We didn't also have a changelog-not-needed
label, but I don't see how having that would be an improvement: there's still as much risk that the label has been applied by mistake (or more commonly, was applied because the PR initially didn't need a changelog and we forgot to remove it after doing an incidental bug fix as part of an internal refactoring).
For backports, we still have needs-backport
. It's there to distinguish “approved” from “ready to merge”. It's somewhat useful for gatekeeping, but there are enough omissions that gatekeepers do need to double check whether a backport is needed.
In any case, the choice isn't between labels and description, it's between description alone and description+labels. Labels alone are not satisfactory because we need to know the reason for the choice. In what way does a changelog-related label help, given that the information is in the description as well?
The description field has been used by Mbed TLS for a while now, and the fact that we're still having this discussion means that it's not working for us.
The description field was an ad-hoc incremental improvement and wasn't trying to solve the problem we wish to address here. It is an improvement to what was before and works better than not having any structured approach and from this perspective it is a success.
The way I see it, we have the following steps here:
- The pr author provides his claims (has/has elsewhere/doesn't need X) and justification
- The reviewers verify and potentially challenge them
- The gatekeeper checks the presence of the claims (I am not sure if verifying correctness validating the justification is or should be at this step)
- The release script consumes the claims
As far as I can tell, for steps 1 and 2 we definitely need something more than just labels. We potentially could use labels for signalling the claims to the gatekeeper, but depending on how we define the gatekeeper role that might not be enough. Even when it is enough, based on Gilles's experience it might not be very useful. Therefore I think at these steps Proposal 1 or 2 could work, and my preference would be 1.
For steps 3-4 we could work with either of the proposals. However, it seems easier to consume labels from the script than parsing descriptions. Currently I would favour this option as I can only see minimal to no downsides
- the visual pollution should be minimal (Open SSL's labels look pretty neat, we won't be able to reach that level because we use labels for project management too, but we should be able to come pretty close)
- there is no duplicated information: the description has the claims and the justification, the label conveys the fact that the gatekeeper performed the checks for that item (whether that means merely checking the presence of the claim or more)
it seems easier to consume labels from the script than parsing descriptions.
I can only agree with this if we assume that the labels are consistent with the description. But I do not think this assumption is warranted. In practice, when we've had labels, they were often inaccurate. So I would want the script to check the description against the labels, and if it does that, I don't see how it could be simpler than not having labels.
I can only agree with this if we assume that the labels are consistent with the description.
Straightforward alternative to labels here is either:
- nothing, no addition to Proposal 1
- Proposal 1 + a text token signalling the gatekeeper's validation
Capturing the gatekeeper check status supposed to give some level of protection against the "initially didn't need a changelog and we forgot" type of mistake: gives some level of confidence that the description is up to date as the gatekeeper checks should take place right before merge.
In case 1 we don't have the consistency problem, but don't have the above assurance either. In case 2 we have the same consistency problem, whether we use text tokens or labels for logging the gatekeeper's validation steps explicitly.
@gilles-peskine-arm based on this I would think that your preference is Proposal 1 with Manuel's improvement but no tracking of gatekeeper validation. Is that correct?
the label conveys the fact that the gatekeeper performed the checks for that item
Ah, ok, my bad, in my earlier messages I was focusing on labels as set by the author or reviewers, and conveying information to both the gatekeeper and the release preparer. My objections do not apply to labels set by the gatekeeper: the gatekeeper sees the final stage of the pull request, so we don't have the problem of forgetting to update after a content change or after a discussion.
As a signal from the gatekeeper to the release preparer, we don't need the same information. Tests are not relevant: if there's any manual testing to be done, it's up to the reviewers to do it and to the gatekeeper to check that it's been done. Changelog-present and backport-present are safe defaults. But we can require the gatekeeper to explicitly indicate when something looks suspicious to the release preparer but is actually safe:
- “from gatekeeper to releaser: no changelog needed”
- “from gatekeeper to releaser: expect a grouped changedlog later”
- “from gatekeeper to releaser: it's ok to not have this on branch B”
I have no objection to using labels for this if it's convenient. Those labels would only be used on merged pull requests, so they aren't even noisy.
Just to clarify what the agreed consensus is.? Proposal 1 and:
- A new label that will be indicate that the gatekeeper has confirmed everything is ok?
- Fine grained labels for various actions like
no changelog needed
?
We still need to define what the per entry checkbox means.
- If it is meant for the gatekeper to indicate that everything is OK then labels would duplicate information
- If its meant to say that everything is there from the developer doing the work, then a label will be needed.
But I agree with Gilles, that we should keep it as simple as possible. Github does not make it easy. We are parsing the text of the PR as a single body of text line by line. Also I am not certain if they have added access to the comments, but in the earlier versions of the gh cli you could only get the body.
If we want information to be linked to each other in the body of the PR it should be in the same line for parsing. Having a label needs 2.8 backports
and a needs 3.6 backport
, but then in the body we mention two links, first the 3.6 and then the 2.28, associating the label to the appropriate linking will be harder. ( We are regex matching the text for https:// links ) to extract all links from the body.
If we are happy to reserve the checkboxes for the gatekeeper signal, then we don't need the labels.
Unrelated question: what would do the script with the links to the backport PRs? (I mean why does it need them?)
Unrelated question: what would do the script with the links to the backport PRs? (I mean why does it need them?
The script tries to create two pool of PR's one for development and one for 2.28. Then it looks into the comments and tries to link them together since github does not do that.
The original aim I assume is to make certain that there are no pr's included in a release while the backports are not included in the LTS branches.
Curenty the matching is very error prone, so we add everything in a report and manually confirm it as the release lead. If we have more deterministic messaging the script could raise a warning.
Just to confirm the proposed checkboxes should look like
Please remove add/remove strikethrough ~~
as appropriate, add any text or justification if required and any relevant links to the end of the line.
- [ ] changelog provided, ~~not required~~, ~~covered~~ .(Comments), (http or # hash-link)
- [ ] backport 2.28 provided, ~~not required~~, ~~is backport~~ .(Comments) , (http or # hash-link)
- [ ] backport 3.6 provided, ~~not required~~, ~~is backport~~ .(Comments) , (http or # hash-link)
- [ ] tests provided, ~~not required~~, ~~covered~~ .(Comments) , (http or # hash-link)
The Gatekeeper shall verify the completeness of justifications, and tick to confirm everything is in place.
A new label that will be indicate that the gatekeeper has confirmed everything is ok?
I don't understand: “the gatekeeper has confirmed everything is ok” is already conveyed by the fact that the gatekeeper pressed the merge button. I thought the point was to convey extra information, e.g. to distinguish “no changelog entry because the changes don't require a changelog entry” from “there is a changelog entry”.
Fine grained labels for various actions like no changelog needed ?
I thought that was the idea, yes.
but in the earlier versions of the gh cli you could only get the body.
Why do we care about the gh cli? I'd expect that we'd rely on the Python bindings, since Python is our preferred scripting language.
I don't understand: “the gatekeeper has confirmed everything is ok” is already conveyed by the fact that the gatekeeper pressed the merge button. I thought the point was to convey extra information, e.g. to distinguish “no changelog entry because the changes don't require a changelog entry” from “there is a changelog entry”.
In practise that is rarely the case. Things are being merged for various reasons, the developer is checking the tickboxes, in some other cases the maintainer is.
In the last two big releases we had >50 prs which have been merged and the checkboxes were in an inconsistent state.
Also as discussed there is a two channels of communication that we are trying to convey.
- One is from the PR owner to the gatekeeper.
- One is from the Gatekeeper to the release maker.
Why do we care about the gh cli? I'd expect that we'd rely on the Python bindings, since Python is our preferred scripting language.
Implementing a full blown graphql github parser in Python is a significant task and honestly a bit out of scope. While now we use their official API to extract the information required such as PR body, or labels, or status.
What we effectively need is the least effort and easier to maintain way to generate a report of everything coming into a release and their state, aiming to help release leads with cathing inconsistencies.
Why do we care about the gh cli? I'd expect that we'd rely on the Python bindings, since Python is our preferred scripting language.
Implementing a full blown graphql github parser in Python is a significant task and honestly a bit out of scope. While now we use their official API to extract the information required such as PR body, or labels, or status.
I'm completely lost. Where does “Implementing a full blown graphql github parser in Python” come from? I guess someone had to do it, but why would be redo it? Writing Python code with import github
is a lot easier than figuring out the quirks and limitations of the gh cli and parsing its output. I've written several scripts using PyGitHub and found it easy to use.
Apologies I was not very clear
PyGithub is using said python bindings to issue the graphQL querries for the information. I am more than happy to ditch github cli, if the equivalent functionality can be achieved in PyGitHub. But that is outside of the scope of this discussion and can be discussed when we review the release helper scripts.
The point is that we need to agree with the structure of the checkboxes, and the information we are capturing now, in order to use it heading towards the 4.0 release.
In practise that is rarely the case. Things are being merged for various reasons, the developer is checking the tickboxes, in some other cases the maintainer is.
The reason for that is that we didn't have a process/common understanding how to use the checkboxes. If we now agree to reserve ticking the boxes to the gatekeeper, there is no need for any other explicit signal.
Also as discussed there is a two channels of communication that we are trying to convey.
One is from the PR owner to the gatekeeper. One is from the Gatekeeper to the release maker.
PR Owner to gatekeeper: text next to the checkboxes Gatekeeper to release script: checkboxes + text next to them
I think this approach should work reasonably well with the text proposed here: https://github.com/Mbed-TLS/mbedtls/pull/9008#issuecomment-2075059035
If we now agree to reserve ticking the boxes to the gatekeeper, there is no need for any other explicit signal.
We'll have PR submitters sometimes check the checkboxes when they seem to apply. But if the process calls for gatekeepers to update the checkboxes, then we can expect the checkboxes to be accurate at the time of merging. It's possible but unlikely that someone will edit the checkboxes later.
But anyway, what information would the checkboxes convey? It's not clear to me how a boolean for each of changelog/branch/tests maps to information for the releaser:
- changelog has three possibilities: included, not needed, will come later.
- branch is either none or a PR link.
- tests is irrelevant for the releaser.
But anyway, what information would the checkboxes convey?
The absence of a tick would convey that the gatekeeper forgot to tick it which makes it very likely that they didn't do the check for that particular element. This works best if only the gatekeeper is allowed to set the checkboxes.
tests is irrelevant for the releaser.
We can omit the checkboxes for any line that is irrelevant for the release script.
If there are no objections, I suggest we move forward with the modified proposal 1:
Please edit the reasons (e.g.: "backport: not needed because this is a new feature") and leave ticking the checkboxes for the gatekeeper.
- [ ] changelog provided, ~~not required~~, ~~covered~~ . Link --> #XYZ
- [ ] dev PR provided, ~~not required~~ Link --> #XYZ
- [ ] 3.6 PR provided, ~~not required~~ Link --> #XYZ
- [ ] 2.28 PR provided, ~~not required~~ Link --> #XYZ
- tests provided, ~~not required~~, ~~covered~~ Link --> #XYZ
- changelog provided, ~not required~, ~covered~ . Link --> #XYZ
The only thing I would add is a commentary and an example eg. Since this is a new system we need to to make as clear as possible how to complete it. We could remove the example in the future.
Please remove add/remove strikethrough ~~ as appropriate, add any text or justification if required and any relevant links to the end of the line.
- [ ] changelog provided, ~~not required~~, ~~covered~~ . (Comments). (Link: http://or # hash-link)
- [ ] dev PR provided, ~~not required~~ .(Comments). (Link: http://or # hash-link)
- [ ] 3.6 PR provided, ~~not required~~ . (Comments). (Link: http://or # hash-link)
- [ ] 2.28 PR provided, ~~not required~~ . (Comments). (Link: http://or # hash-link)
- [ ] tests provided, ~~not required~~, ~~covered~~ . (Comments). (Link: http://or # hash-link)
The Gatekeeper shall verify the completeness of justifications, and tick to confirm everything is in place.
Reference Example.
- [x] changelog ~~provided~~, ~~not required~~, covered . Changelog is included in another PR. Link: #XYZ
I would add is a commentary and an example
Experimentally, the more text there is, the more people completely ignore the whole thing. Every word counts. The #1 problem isn't people who don't understand what to do, it's TLDR.