OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[PROPOSAL] Author CHANGELOG in each PR instead of collecting them in the last days before a release

Open dblock opened this issue 3 years ago • 30 comments

What kind of business use case are you trying to solve? What are your requirements?

Release notes are of varying quality.

For example, https://github.com/opensearch-project/OpenSearch/pull/2489 is completely unreadable.

What is the problem? What is preventing you from meeting the requirements?

  • Release notes aren't always super useful and may lack context.
  • Release notes are only available at release time, and not during development, so it's unclear what has changed.
  • Each plugin team is responsible for release notes, and the person assembling the release notes may not be the best one to write about a specific change, so YMMV.

What are you proposing? What do you suggest we do to solve the problem or improve the existing situation?

  1. Replace authoring release notes at the end before a release with authoring release notes in each PR.
  2. Require that release notes be updated via something like danger-changelog with every change.
  3. Automate collection of release notes across all projects. https://github.com/opensearch-project/opensearch-build/issues/438 https://github.com/opensearch-project/opensearch-build/issues/698

dblock avatar Jan 07 '22 19:01 dblock

cc: @elfisher

dblock avatar Jan 07 '22 19:01 dblock

I see how this addresses Release notes are only available at release time, which I like, but I'm failing to understand how it address the other two points.

It seems like the output of danger-changelog is not too dissimilar to the existing release notes (last good example - subsequent 1.2.x are all log4j fixes), except without the context provided by the categories.

I will add that the release notes for a release do go through a human spot check. Often times that spot check will catch things like incorrect terminology, typos, or inconsistencies. Does the script driven approach allow for this?

stockholmux avatar Jan 07 '22 19:01 stockholmux

It seems like the output of danger-changelog is not too dissimilar to the existing release notes (last good example - subsequent 1.2.x are all log4j fixes), except without the context provided by the categories.

danger-changelog just checks that the PR includes a well-formatted entry for the changelog, it doesn't actually write it. The change I am proposing is that humans are required to author the changelog line and it gets reviewed as part of the PR

I will add that the release notes for a release do go through a human spot check. Often times that spot check will catch things like incorrect terminology, typos, or inconsistencies. Does the script driven approach allow for this?

It doens't. I am proposing to remove any script-driven approach completely and move the authoring of release notes/changelog to the PRs, sorry if this wasn't clear. I've edited the issue and removed the CHANGELOG part, it just muddies the waters

dblock avatar Jan 07 '22 19:01 dblock

Would there be an expectation as part of the code review to review the release notes in the PR as well? That might help with quality.

elfisher avatar Jan 07 '22 21:01 elfisher

Would there be an expectation as part of the code review to review the release notes in the PR as well? That might help with quality.

Yes.

dblock avatar Jan 10 '22 18:01 dblock

How about a format of the CHANGELOG that allows for cross checked against commits since last release? This allows maintainers or contributors to add entries as needed or independently clean up the notes. This avoids adding a barrier to contributions by adding another criteria on the PR to merge.

As part of health of the repo a tool can verify how many 'undocumented' items are not in the CHANGELOG. This can be integrated into the release process during the post-code-freeze - prerelease time it can be required that maintainers drive that number to zero. This allows for time to group similar changes and have dialogs about terminology/presentation.

peternied avatar Apr 22 '22 18:04 peternied

@peternied Why would we need that if every PR must have a CHANGELOG entry?

dblock avatar Apr 22 '22 18:04 dblock

@dblock You are right, I am not thinking of chronological account of the changes in a codebase.

I want a reference of Most important -> Least important changes since last release. Here is a contrived example of what I'd prefer vs changelog.


Release Clatu Verata Nictu

Breaking

  • Removed indexing operators because they are not thread safe (#1337)

Bug Fixes

  • Thread.current() doesn't crash when called twice in a row (#4567)

Features

  • Added new lambda expressions, map(...), reduce(...), sum(...). (#44, #45, #46, #4834)

vs

Release Clatu Verata Nictu

  • Refactor sum to use reduce instead of its own implementation (#4834)
  • Thread.current() doesn't crash when called twice in a row (#4567)
  • Removed indexing operators because they are not thread safe (#1337)
  • Added new lambda expression, reduce(...), (#46)
  • Added lambda expressions, sum(...). (#45)
  • Added new lambda expressions, map(...), (#44)

peternied avatar Apr 22 '22 20:04 peternied

@peternied We would adopt a standard, such as https://keepachangelog.com/en/1.0.0/

dblock avatar Apr 22 '22 21:04 dblock

@peternied We would adopt a standard, such as https://keepachangelog.com/en/1.0.0/

That standard format looks great, I am sold.

Types of changes

  • Added for new features.
  • Changed for changes in existing functionality.
  • Deprecated for soon-to-be removed features.
  • Removed for now removed features.
  • Fixed for any bug fixes.
  • Security in case of vulnerabilities.

peternied avatar Apr 22 '22 21:04 peternied

Thanks @dblock for the proposal. Another key area would be to standardize the various buckets or groups created for the release notes entries such as Enhancements, Bug fixes, Maintenance, Refactoring, Infrastructure etc. These are pretty insightful for readers to extract focussed information. Some of the recent release notes have these :

  • https://github.com/opensearch-project/security/pull/1772/files
  • https://github.com/opensearch-project/alerting/pull/426/files

Do we expect the CHANGELOG to have these groups pre-defined, that will allow authors/reviewers to add/review the CHANGELOG entry, added under the right group, along with every PR?

getsaurabh02 avatar Apr 25 '22 23:04 getsaurabh02

Do we expect the CHANGELOG to have these groups pre-defined, that will allow authors/reviewers to add/review the CHANGELOG entry, added under the right group, along with every PR?

Yes, I expect those to be exactly from the list in https://keepachangelog.com/en/1.0.0/, at least to start.

dblock avatar Apr 26 '22 12:04 dblock

Require that release notes be updated via something like danger-changelog with every change.

I'm guessing a "change" here could consist of multiple commits, as described by keepachangelog.org:

The purpose of a commit is to document a step in the evolution of the source code. Some projects clean up commits, some don't. The purpose of a changelog entry is to document the noteworthy difference, often across multiple commits, to communicate them clearly to end users.

In the case of multiple commits that gradually build up to a full feature, @dblock is there a good mechanism we can put in place to make sure the changelog is reliably updated?

Alternatively, we could implements Conventional Commits as a precommit hook. That would give us a mechanism, but it still requires a changelog to be built and curated at release time. Thoughts?

kartg avatar Apr 26 '22 19:04 kartg

@kartg I think we do simple: every PR requires a CHANGELOG update. Humans ensure quality in code reviews. Multiple commits can be merged in CHANGELOG as we make progress.

dblock avatar Apr 26 '22 20:04 dblock

Works for me. In that case, we should define what makes a good Changelog entry - it shouldn't end up being just a repeat of the commit message.

kartg avatar May 03 '22 17:05 kartg

@kartg @Poojita-Raj @dblock @getsaurabh02 @elfisher @stockholmux I've opened https://github.com/opensearch-project/security/pull/1821 - would love your thoughts on how this looks

peternied avatar May 03 '22 21:05 peternied

@kartg I think we do simple: every PR requires a CHANGELOG update. Humans ensure quality in code reviews. Multiple commits can be merged in CHANGELOG as we make progress.

from personal experience (we use this format in-house) i can tell you that, while this is a nice idea, it does result in a lot of merge conflicts (as they're all congesting on the same lines in CHANGELOG.md). i don't have a good solution for this though.

i do however 100% vote to follow Keep a Changelog for the actual changelog! i've added some comments to the draft

rursprung avatar May 05 '22 10:05 rursprung

There've been the release drafter to generate draft release notes capturing all merged PRs. The release notes are generated by CI pipeline on each PR merge and avaialble for publishing releases. Do we still need this manually documented changelog?

This is a good idea that makes changelog available before release! It'd be perfect if it's done automatially instead of manually. It'd be very easy to forget to update CHANGELOG.md on each PR especially for new contributors who are not familiar with our develpment process. In comparison, the release drafter run in CI automatically documents the changes, which makes more sense.

cliu123 avatar May 18 '22 20:05 cliu123

@cliu123 I think the automation you propose would produce complete unreadable garbage right now between backport PRs, DCO messages, etc, at least on OpenSearch. So it would solve having release notes continuously available, but it wouldn't help making these notes useful. I could be wrong though, seems pretty cheap to try and enable it on this repo, want to try it? also how does it work across 3 ongoing releases in various branches?

dblock avatar May 18 '22 21:05 dblock

After a dive into the discussion and release-drafter, it looks like release-drafter does not directly support multiple drafts on multiple branches. There is a workaround available which can possibly support multiple draft notes. The caveat here is that we need a config for every version/branch in our case which is not scalable. We will have to either put in efforts to automate this using the script which could run with the version bump script but can still cause problems due to misconfigured target_commitish and the usage of GitHub UI for release publish.

In the case we decide to fix all the issues with release-drafter or if we decide to move forward with a changelog based approach, it would be a great addition to start labelling PRs to classify them into appropriate buckets. I have opened a PR for adding a label checker to PRs - #3618

I am happy to chat and would love to move this discussion forward.

kotwanikunal avatar Jun 17 '22 17:06 kotwanikunal

@kotwanikunal This proposal just says that the CHANGELOG would be edited by humans with every PR, thus you don't have to build any automation at all on branches to collect release notes, solves the problem you describe above

dblock avatar Jun 17 '22 19:06 dblock

@kotwanikunal This proposal just says that the CHANGELOG would be edited by humans with every PR, thus you don't have to build any automation at all on branches to collect release notes, solves the problem you describe above

@cliu123 I think the automation you propose would produce complete unreadable garbage right now between backport PRs, DCO messages, etc, at least on OpenSearch. So it would solve having release notes continuously available, but it wouldn't help making these notes useful. I could be wrong though, seems pretty cheap to try and enable it on this repo, want to try it? also how does it work across 3 ongoing releases in various branches?

@dblock I was referring to the mechanism used by plugins mentioned in the previous comment. The release-drafter approach does not make it any easier for us in terms of maintenance. Currently its difficult to compile these notes as they are and as a path forward, I was proposing labelling PRs to make it easier to classify even with the CHANGELOG. What do you think about that?

kotwanikunal avatar Jun 17 '22 20:06 kotwanikunal

I don't see the point in labelling PRs if the said PR must also include a 1-line change in the appropriate section of the CHANGELOG in progress. Am I missing a scenario where the label would be useful?

dblock avatar Jun 17 '22 23:06 dblock

The proposal I had was something on the lines of -

  • Add label checker for improving the current scenario (even manually generating the notes for that matter)
  • Parallely/Later work on generating CHANGELOG for PRs in a format where the changelog generation itself is automated by a GHA which looks at the label and adds the changes (PR title/link) to the right location in the changelog

I know the proposal originally required the requester to add the details to the CHANGELOG manually but I was trying to ease out that experience by going a bit further. With the original proposal, it will work more on the lines of an enforcer action than a generator. That should be an easy thing to add in if we are planning to go forward that

kotwanikunal avatar Jun 18 '22 00:06 kotwanikunal

I know the proposal originally required the requester to add the details to the CHANGELOG manually but I was trying to ease out that experience by going a bit further. With the original proposal, it will work more on the lines of an enforcer action than a generator. That should be an easy thing to add in if we are planning to go forward that

I think these two things are fairly separate. In my experience requiring humans to author a 1-liner in CHANGELOG is not a big deal, but yields the best outcome: readable changes that are attached to every PR and that exist at all times during development. Everything else I think is optional/nice to have/unnecessary.

dblock avatar Jun 20 '22 18:06 dblock

Just wanted to document how I generated 2.1.0 release notes, so that this process might help in automation or for next release. Steps:

* `git checkout origin/2.1`
* `git log --pretty=format:"* %s" --since=3-23-2022 > release-notes/opensearch.release-notes-2.1.0.md`
* Manually organize file sections
  * Add Date / Version section (copy + modify from previous release)
  * Enhancements/Bug fixes/Maintenance
* Replace pull request ids with full urls via regex
  * Find: `\(#(\d+)\)$`
  * Replace: `([#$1]([https://github.com/opensearch-project/opensearch/pull/$1))`](https://github.com/opensearch-project/opensearch/pull/$1))%60)
* Commit with these steps / modified with additional details

-> In second line of above command block, when using --since the date should be selected carefully. You should select the date when opensearch version bumped from MajorVersion.x to new version, example (2.x version bump to 2.1)

-> I had to manually go through each commit and sort it out to one of the sections mentioned here.

-> Next I had to check which commits are already present in previous release version and remove them from current release notes manually. This step can be done before sorting out each commit to appropriate section.

->Make sure no breaking changes are included if it is a minor release versions.

Rishikesh1159 avatar Jul 08 '22 23:07 Rishikesh1159

@dblock I have worked on adding the workflow as a part of this PR - #4085 The solution as it is currently implemented in the PR will not solve for backported PRs which will need additional work. (Maybe something on the lines of https://github.com/dangoslen/dependabot-changelog-helper) This however does solve for dependabot changes automatically and enforcing PRs for a manual changelog. Before we move ahead with the additional effort for backporting, I wanted to open this up for further discussion.

kotwanikunal avatar Aug 02 '22 16:08 kotwanikunal

#4085 - change has been merged in and backported to 2.x as well. Keeping the issue open to address any gaps that come up during the transition.

kotwanikunal avatar Aug 24 '22 22:08 kotwanikunal

The changes have started showing results for the major unreleased branch and dependabot PRs. There are still some gaps around backporting that I have noticed and also called out as a part of #4085 -

  • There are frequent merge conflicts due to the sequence mismatch in backporting leading to a manual intervention
  • There is no explicit entry for the change in the version to which the PR is backported to leading to an incomplete changelog for that version

A possible solution for the above problem is -

  • Drop the changelog file from https://github.com/VachaShah/backport/blob/main/src/backport.ts#L114-L119 and retry if the issue is with the CHANGELOG.md file
  • Fork dangoslen/dependabot-changelog-helper plugin and adapt it for backport wherein we pull the original PR. locate the changed line for CHANGELOG.md, add an entry under the version for the backported branch

Also, there is another discussion around missing process for feature branches here - #4477

@dblock

kotwanikunal avatar Sep 12 '22 17:09 kotwanikunal

Thanks, I think fixing backport to properly merge changelog is the way to go.

dblock avatar Sep 13 '22 19:09 dblock