sympy icon indicating copy to clipboard operation
sympy copied to clipboard

Move to a different release notes system

Open asmeurer opened this issue 1 year ago • 44 comments

We should consider moving from sympy-bot to something that uses GitHub actions for the release notes. The bot is currently having issues https://github.com/sympy/sympy-bot/issues/105, and GitHub actions would be harder to maintain. The main downside is that writing the notes as a file in the repo would be harder, but it might also make people actually write better notes, so that could be a good thing.

asmeurer avatar May 19 '23 20:05 asmeurer

If there can be a quick fix for the existing release note bot then that would be good. With other CI problems it can be possible to temporarily disable a CI check but that does not work for the release notes because we would lose all the data for what should be in the release notes.

In general though I am definitely in favour of moving to a different system. The current system involves a bunch of things that are difficult to automate during the release process. I also don't like the fact that the release notes can be edited without going through the normal review process and that release notes are not as well reviewed as the other content of PRs. I also would prefer that the release notes be a part of the docs and I think that amalgamating the release notes for a release should be a PR that gets reviewed by as many people as possible.

In https://github.com/sympy/sympy-bot/issues/105#issuecomment-1554557961 @skieffer suggested the possibility of towncrier which I think could work. In https://github.com/sympy/sympy-bot/issues/105#issuecomment-1555191811 @asmeurer suggested looking at the system that NumPy uses.

oscarbenjamin avatar May 19 '23 21:05 oscarbenjamin

I also would prefer that the release notes be a part of the docs

I'd like to see them as part of the docs as well as being shipped with our source tarballs.

moorepants avatar May 20 '23 06:05 moorepants

being shipped with our source tarballs.

The doc sources are included in the sdists. I guess what you mean though is that the changelog or release notes should be more prominent in the sdist than just being under doc/src somewhere.

Currently the contents of the sdist are like:

sympy-1.12
├── AUTHORS
├── bin
├── data
├── doc
├── examples
├── isympy.py
├── LICENSE
├── PKG-INFO
├── pyproject.toml
├── README.md
├── setup.cfg
├── setup.py
├── sympy
└── sympy.egg-info

Maybe there should just be a "changelog" directory. The docs could then be built using that but it would be visible at top-level in the sdist.

For NumPy there seem to be two separate things: the changelog which lists all PRs and the release notes which list only changes of potential relevance for downstream: https://numpy.org/doc/stable/release/1.24.0-notes.html https://github.com/numpy/numpy/blob/main/doc/changelog/1.24.0-changelog.rst

It looks like the changelog is generated from git using this script: https://github.com/numpy/numpy/blob/main/tools/changelog.py

The release notes part is made using towncrier: https://github.com/numpy/numpy/tree/main/doc/release/upcoming_changes

I suggest we add a changelog directory at top-level in the repo and under it the structure would be like:

changelog/
├── 1.10
│   ├── changelog.md
│   └── release-notes.md
├── 1.11
│   ├── changelog.md
│   └── release-notes.md
├── 1.12
│   ├── changelog.md
│   └── release-notes.md
└── 1.13
    └── notes
        ├── 12345.polys.md
        ├── 12567.solvers.md
        └── 12890.core.md

Then release notes (if needed) would be added in a PR with an issue/PR number and a submodule in the name. The towncrier script would be used before the release to merge those into a release-notes.md and a separate script could list all PRs in the changelog.md. Since that change would come through as a PR before release it gives a chance for the release notes to be reviewed again in context. The docs would include these pages somewhere.

Since there would also be a changelog it might not be as necessary for most PRs to include a release note although there should be better discipline about having good PR titles since those are what would show in the changelog. Currently there is a requirement to at least add NO ENTRY if there should not be a release note but I think that it might be fiddly to check this.

If we are agreed about this as an overall structure then I can put the basic parts together to unblock CI. We would need to think about exactly how to incorporate this into the docs but that could be done later. For now what is needed is some way to collect release notes from PRs because currently all new PRs will be blocked on this.

oscarbenjamin avatar May 22 '23 11:05 oscarbenjamin

└── 1.13 └── notes ├── 12345.polys.md ├── 12567.solvers.md └── 12890.core.md

Actually it would be better to have these in something like changelog/master rather than changelog/1.13 so that they do not need to be changed for a long running PR after a release is made.

oscarbenjamin avatar May 22 '23 13:05 oscarbenjamin

The doc sources are included in the sdists. I guess what you mean though is that the changelog or release notes should be more prominent in the sdist than just being under doc/src somewhere.

I was under the impression that a copy of the release notes are not present in our sdists. Maybe it is? Or it has been added since I last checked.

My suggestion is that we simply include release notes in the sdist, whether they are prominent or not, I have no opinion.

moorepants avatar May 22 '23 16:05 moorepants

We should just include them in the docs. That's the most natural place for them to live. In general, we should move all "official" stuff that's currently in the wiki to the docs. This has already been done for the development guides (there might be a couple of things that are still left to move; I need to do a full runthrough). If the release notes are in the docs, then people can just navigate to https://docs.sympy.org/latest/release-notes.html (or whatever) from the docs site to read them. The docs will also get better indexing from search engines than the wiki or any other source. And if the notes are in the docs, it makes it trivial to reference documentation from the notes, which makes them much more readable, and also to reference release notes from the documentation (that will be rarer but it could be useful).

The wiki dates back to the earliest days of SymPy when there was no actual documentation and Ondřej just set up a MediaWiki server (this was later moved to the GitHub wiki). While there are still some projects today which exclusively use wikis for documentation, the best documentation is not in wikis. Wikis like the GitHub wiki are a poor choice for documentation because they lack important features like navigation, search and cross references, "official" pages are intermixed with "scratchpad" pages, and edits to wikis can be done by anyone and mostly go through unreviewed.

We just need to make sure that this doesn't make overall development process doesn't become too complicated. Ideally, the current make html should just work, regardless of what state the release notes files are in. For creating the notes themselves, it should be something very simple, like just creating a new file in some directory with some simple structure. We can cleanup the files into a single file whenever we do a release, but that script should also be called automatically by make html so that building the dev docs works.

Also, a minor nitpick from the NumPy method: I don't like that their release notes files contain the PR number in the file name. That means that you can't write the release notes entry until after you open your PR. But you should be in the habit of always writing entries, so they should be written and committed alongside your changes, before you even open a PR. So I would omit the PR number from the filename (also, the PR number can be inferred automatically from the git history, so this is also just unnecessary work for contributors).

asmeurer avatar Jun 02 '23 22:06 asmeurer

That means that you can't write the release notes entry until after you open your PR.

I agree that that's a drawback to this system. Some projects use the issue number (when there is one) instead of the PR number.

One way to semi-automate it would be to check for presence of the release notes file as the very first CI test. When the file is absent, then all other tests are canceled, and the error message could include a GitHub URL to create the file with the correct name.

Still, this means that every PR fails its first run, which is a little annoying.

skieffer avatar Jun 03 '23 11:06 skieffer

the PR number can be inferred automatically from the git history

That's an interesting idea. If it was guaranteed that every merged PR would have the standardized commit message of the form

    Merge pull request #NNNNN from USER/BRANCH-NAME

    Etc.

then the rule for naming the release notes file in your PR could be to use USER.BRANCH-NAME.feature.rst (or replace feature with bug etc.). A script could then search git log for the merge commits, rename the files to now use the PR number in place of USER.BRANCH-NAME, and then finally run towncrier.

skieffer avatar Jun 03 '23 11:06 skieffer

Basically every PR merge commit has that commit message, unless the person who merges it manually changes it to not be that way (which no one does). Even if that were the case we could figure it out using the GitHub API.

BTW that point isn't just that you can't make the file without making the PR first. It's also annoying and error prone to lookup the PR number to put it in the file name (same with an issue number). This is stuff that can be figured out by the scripts automatically, so we shouldn't make people do it. The whole process should be as easy as possible: just create a file with any (unique) name, containing a simple structure. The actual metadata like the PR number and the authors can be constructed automatically, just like @sympy-bot does.

asmeurer avatar Jun 03 '23 21:06 asmeurer

I would really like the relevant information for the release notes to be in the diff that gets merged so that it is visible to everyone including both PR author and reviewer. We can of course make something that will help someone to write the release note in the expected format like a script that can be run as:

$ bin/make_release_note.py
Enter GitHub username for the release note: asmeurer
Branch name is fix-lambdify
Enter sympy submodule (e.g. solvers, simplify, ...): utilities
Enter type of change (feat, bug, doc, ...): bug
Created doc/release_notes/master/asmeurer.fix-lambdify-1.txt
Contents of file:
-----------------------------------------------------
Author: @asmeurer
Submodule: utilities
Type: bug
Note:

<Please replace this line with an explanation of the change
written in markdown for the release notes in the docs>
------------------------------------------------------
Please edit doc/release_notes/master/asmeurer.fix-lambdify-1.txt
to add the explanation for the release note.

I suggest for now that we do just that so that at least PRs can be merged without losing the information needed to reconstruct release notes later.

Currently the bot also notes who merged the PR but I guess there is no way to get that information until after it is merged.

oscarbenjamin avatar Jun 07 '23 13:06 oscarbenjamin

Maybe just this:

import sys
import subprocess
import argparse
import pathlib


template = """\
Author: {username}
Submodule: {submodule}
Type: {change_type}
Note:

<Please replace this line with an explanation of the change
written in markdown for the release notes in the docs>
"""


def main(*args):
    """
    Example:

    $ python bin/make_release_note.py
    Enter GitHub username for the release note: asmeurer
    Branch name is fix-lambdify
    Enter sympy submodule (e.g. solvers, simplify, ...): utilities
    Enter type of change (feat, bug, doc, ...): bug
    Created doc/release_notes/master/asmeurer.fix-lambdify-1.txt
    Contents of file:
    -----------------------------------------------------
    Author: @asmeurer
    Submodule: utilities
    Type: bug
    Note:

    <Please replace this line with an explanation of the change
    written in markdown for the release notes in the docs>
    ------------------------------------------------------
    Please edit doc/release_notes/master/asmeurer.fix-lambdify-1.txt
    to add the explanation for the release note.
    """
    username = input("Enter GitHub username for the release note: ")
    branch = subprocess.check_output(["git", "branch", "--show-current"]).decode().strip()
    print("Branch name is", branch)
    submodule = input("Enter sympy submodule (e.g. solvers, simplify, ...): ")
    change_type = input("Enter type of change (feat, bug, doc, ...): ")

    filename = "{}.{}.{}.txt".format(username, branch, change_type)
    filepath = pathlib.Path("doc") / "release_notes" / "master" / filename
    contents = template.format(
        username=username,
        submodule=submodule,
        change_type=change_type)
    )

    with open(filename, "w") as f:
        f.write(contents)

    print("Created", filename)
    print("Contents of file:")
    print("-" * 50)
    with open(filename) as f:
        print(f.read().strip())
    print("-" * 50)
    print("Please edit {} to add the explanation for the release note.".format(filename))


if __name__ == '__main__':
    main(*sys.argv[1:])

oscarbenjamin avatar Jun 07 '23 13:06 oscarbenjamin

To ensure that filenames are easy to parse later, I'd suggest adding a check that the branch name not contain any dots:

branch = subprocess.check_output(["git", "branch", "--show-current"]).decode().strip()
if '.' in branch:
    print("Please use a branch name without dot (.) characters.")
    sys.exit(1)
print("Branch name is", branch)

I can't recall seeing anyone use dots in a topic branch name (release branch, yes), but they are allowed.

As for valid chars in github usernames, I found it remarkably difficult to find docs on this. This project claims to have determined (via form validation messages on the "Join GitHub" page!) that it's limited to alphanumerics and hyphens.

skieffer avatar Jun 07 '23 14:06 skieffer

To ensure that filenames are easy to parse later

Then again,

parts = filename.split('.')
username = parts[0]
change_type = parts[-2]
branch = '.'.join(parts[1:-2])

so maybe it doesn't matter.

skieffer avatar Jun 07 '23 14:06 skieffer

But yet again, what about slashes in branch names?

Maybe it would be good to require that they use just alphanumerics and hyphens?

Or do percent encoding, in order not to have to bother the contributor.

skieffer avatar Jun 07 '23 15:06 skieffer

Currently the bot also notes who merged the PR but I guess there is no way to get that information until after it is merged.

No it doesn't? https://github.com/sympy/sympy/wiki/Release-Notes-for-1.12 And I don't think that info really needs to be there either.

It does include multiple authors when someone else pushes something to the PR, and it also includes the person who opened the PR if they happened to not actually push any commits. It does this using the GitHub API but for this script it would be simpler to use the git history (especially considering we already have a well maintained .mailmap).

To ensure that filenames are easy to parse later, I'd suggest adding a check that the branch name not contain any dots:

I don't think the filename even needs to be based on the branch name. It's a good default for a script that generates it for you (and that script can just remove things like dots and slashes). But in general, whatever builds the notes should just gather all the files together, and the filenames would not be part of the final notes. It doesn't matter for reviewing either because you're just going to review whatever file is new in the PR.

asmeurer avatar Jun 07 '23 20:06 asmeurer

Currently the bot also notes who merged the PR No it doesn't?

No it doesn't... I guess I misremembered that.

But in general, whatever builds the notes should just gather all the files together, and the filenames would not be part of the final notes.

Exactly, the filenames just need to be unique. We don't need to use the branch name but I imagined that it would usually give a vaguely meaningful unique identifier.

oscarbenjamin avatar Jun 07 '23 21:06 oscarbenjamin

t would be simpler to use the git history (especially considering we already have a well maintained .mailmap).

The git history uses different metadata from what is currently used by the bot. The bot uses GitHub usernames but the .mailmap file uses the names and email addresses that are in the commit metadata. The .mailmap file maps all commits uniquely to a single name/email for the purposes of the git metadata which then means that every commit corresponds to a single line in the AUTHORS file. There is no mapping from the git metadata to GitHub usernames or vice versa though.

I don't know if it is intentional that the release notes use GitHub usernames (maybe it was just easier for the bot to get those?). It would be easier just to use git metadata for the release notes since it is already in the commits and the .mailmap already resolves ambiguities. If we imagine a future in which there would be a need to move away from GitHub then it would be better if long term records like the release notes were not tied to GitHub metadata.

I also wonder if it could be possible to combine the release notes file with the need to update .mailmap to make it easier for new contributors somehow because the .mailmap is something that many contributors struggle with. I think this last point is something to hold off on for now though because we just need to get something working for the release notes ASAP.

oscarbenjamin avatar Jun 07 '23 21:06 oscarbenjamin

I did that in the bot so that it could cross-link to people's GitHub profiles. That isn't that important if it's hard to do.

asmeurer avatar Jun 07 '23 21:06 asmeurer

I did that in the bot so that it could cross-link to people's GitHub profiles. That isn't that important if it's hard to do.

It's not hard to do if we ask for the information to be provided as I suggested above:

 username = input("Enter GitHub username for the release note: ")

If we want to infer it automatically then it is harder.

Even using the git metadata is not really "automatic". It would be automatic if all contributors set up their git/GitHub metadata and we just used it directly without question. We don't just use it though and instead require all git metadata to be listed in the mailmap. Seeing the ways that new contributors get stuck with the mailmap check in CI shows that many contributors do not provide the metadata in the commits in such a way that we could just use it automatically as suitable for the AUTHORS file.

oscarbenjamin avatar Jun 07 '23 22:06 oscarbenjamin

We should do whatever is easiest. We don't even have to include the author information at all if we don't want to. The NumPy notes do not, for instance https://numpy.org/doc/stable/release/1.24.0-notes.html. That info is available in the linked PR.

asmeurer avatar Jun 07 '23 22:06 asmeurer

Okay, there are a lot of different methods being considered here. For a separate project that I maintain, I am considering using percent-encoded branch names in the filenames, and using git log to transform these into PR numbers, by seeking commit messages of the form, Merge pull request #NNNNN from USER/BRANCH-NAME. If you want to use that system, I've written a script for it, and share it here in case it's useful.

#!/usr/bin/env python

"""
Rename files in the release notes directory, changing names like

    USER.BRANCH.bug.txt

to instead be of the form

    12345.bug.txt

where the number denotes the PR in which this branch was merged.
"""

import argparse
import pathlib
import subprocess
import sys
import urllib.parse


def rename_files(dry_run=False):
    notes_dir = pathlib.Path("doc") / "release_notes" / "master"
    if not notes_dir.exists():
        print(f'Release notes directory `{notes_dir}` does not exist.')
        sys.exit(1)

    filepaths = list(notes_dir.iterdir())
    print(f'Found {len(filepaths)} release notes files.')

    no_match = []
    match_count = 0
    for filepath in filepaths:
        filename = filepath.name
        username, pct_enc_branch, change_type, ext = filename.split('.')
        branch = urllib.parse.unquote(pct_enc_branch)
        regex = rf'Merge pull request #\d+ from {username}/{branch}$'
        cmd = f'git log -P --format=%s --max-count=1 --grep="{regex}"'

        commit_title = subprocess.check_output(cmd, shell=True).decode().strip()
        if not commit_title:
            no_match.append(filename)
            continue
        else:
            match_count += 1

        PR = commit_title.split()[3][1:]
        newname = f'{PR}.{change_type}.{ext}'
        cmd = f'mv {notes_dir/filename} {notes_dir/newname}'
        print(cmd)
        if not dry_run:
            subprocess.run(cmd.split())

    print(f'{"Would rename" if dry_run else "Renamed"} {match_count} files.')
    if no_match:
        print(f'Could not find merge commit for {len(no_match)} files:')
        for filename in no_match:
            print('  ', filename)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Rename release notes files.')
    parser.add_argument('-d', '--dry', help='do a dry run', action='store_true')
    args = parser.parse_args()
    rename_files(dry_run=args.dry)

skieffer avatar Jun 08 '23 14:06 skieffer

That is useful and I think that could work.

We would need to ensure before the PR is merged that the filename is correct for the given username and branch name.

In GitHub actions the GitHub username is available as github.event.pull_request.user.login. Possibly the branch name is github.ref_name. We could pass those to a script that would validate the given file.

Do we want to require the submodule name as part of the file like oscarbenjamin.lambdify_fix.utilities.txt?

Then if there are release notes for multiple submodules they just need to have separate files?

Do we need the equivalent of requiring a contributor to explicitly add NO ENTRY? I guess it could be oscarbenjamin.typo_fix.no_entry.txt?

The thing that needs to be agreed here before we can fix this is what precisely we ask contributors to do and when exactly are CI checks expected to fail either because the release notes are missing or incorrect.

Maybe the rule from the CI check could be

  • EITHER there is exactly one release note for the given username and branch name and it is for the no_entry submodule and just says NO ENTRY.
  • OR there are one or more release note files for different submodules.

Then also what should the content of the file be? Should it just be an unordered markdown list like:

- A bug in solve was fixed.
- Also solve was made faster for some things.

Do we need to validate that the markdown is syntactically correct? (as long as the text content is reasonable it is probably not difficult to fix any markdown syntax later).

Alternatively we ask contributors to just create a file called user.branch.txt and inside they can add headings like now with:

* solvers
    * Fixed a bug

Or otherwise the file just contains the text NO ENTRY. This is potentially easier for contributors since it is not multiple files but validating that the contents of the release note file match the expected pattern is a little harder.

oscarbenjamin avatar Jun 08 '23 16:06 oscarbenjamin

This is potentially easier for contributors since it is not multiple files

I guess I'd lean toward whatever makes things easiest for contributors, and I would agree with you that that would probably mean forming just a single release notes file. I think the interactive script that generates the file is great.

As for processing the contents of the file, if that's to be written in the form

* solvers
    * Fixed a bug

can existing code from sympy-bot be used here?

skieffer avatar Jun 08 '23 20:06 skieffer

In GitHub actions the GitHub username is available as github.event.pull_request.user.login. Possibly the branch name is github.ref_name. We could pass those to a script that would validate the given file.

Do we want to require the submodule name as part of the file like oscarbenjamin.lambdify_fix.utilities.txt?

Again, I don't think any of those should be required to be part of the filename. The filename should just be something unique, and not be part of the final notes. The username can be inferred. You're already suggesting pulling it from the API, so why not just use it? Or we can just use the real name from git. The submodule can be part of the markdown. A short name like lambdify_fix is useful for the filename but doesn't really add anything to the final release notes.

can existing code from sympy-bot be used here?

Here's all the relevant code https://github.com/sympy/sympy-bot/blob/master/sympy_bot/changelog.py. The list of submodules is at https://github.com/sympy/sympy-bot/blob/master/sympy_bot/submodules.txt, but we could also just pull them directly from the source.

asmeurer avatar Jun 09 '23 21:06 asmeurer

Again, I don't think any of those should be required to be part of the filename. The filename should just be something unique, and not be part of the final notes. The username can be inferred. You're already suggesting pulling it from the API, so why not just use it?

There are two separate stages here:

  1. When a contributor submits a PR the CI checks will run and can validate whether the release note data is in the intended format.
  2. Later on someone can run a script (like the one shown by @skieffer above) that will amalgamate the previously merged release note files into a release note page for the docs.

I suggested that at stage 1 we can infer the github username and the branch name in a CI script in order to validate that that information is correctly encoded in the files that are added/changed by the PR. However at stage 2 we no longer have the username and branch name unless they were previously encoded in the actual diff of the PR somehow. The information can be in the filename or it can be in the content of the file but it needs to be somehow in the file tree because none of the other git metadata stores the GitHub username.

Or we can just use the real name from git.

I'm not sure that it is easy to get that from the release note file and git history. For example we would still want to link to at least the PR if not an associated issue and I don't immediately see how we would get that from the git history (without using @skieffer's approach that requires knowing both GitHub username and branch name which requires that those are both present somehow in the file tree).

oscarbenjamin avatar Jun 09 '23 21:06 oscarbenjamin

Let's get this in perspective: currently SymPy development is largely blocked on this issue. The blocker here is how exactly we infer some data rather than asking that the contributor provide the data. There is a very simple solution to this though: just ask the contributor to provide the data.

There are two sides to this as with all data because there is one side (the contributor) who provides the data and another (the release scripts) that consumes the data. When I am in the position of being the consumer of the data I would much prefer that the data was explicitly provided rather than needing to try to infer the data from imperfect or indirect sources. When I am in the position of being the producer of the data I would much prefer to provide the data explicitly so that I know what data is being provided. On either side I prefer the data to be explicit rather than inferred.

We can provide tooling to help generate the data and to validate that it is in the intended form. To me that would seem more helpful than a system that tries to infer data in a way that I don't understand (as a user of that system).

I can send a PR that requires users to explicitly provide the data along with scripts that help to generate and validate that data. If anyone prefers that the data be inferred then can they please outline a full workable system for doing that along with how CI checks and everything else would work?

oscarbenjamin avatar Jun 09 '23 22:06 oscarbenjamin

Let's get this in perspective: currently SymPy development is largely blocked on this issue. The blocker here is how exactly we infer some data rather than asking that the contributor provide the data. There is a very simple solution to this though: just ask the contributor to provide the data.

The blocker here is someone writing the script to do this. I don't have the time right now to do it.

asmeurer avatar Jun 19 '23 18:06 asmeurer

Since the bot is still broken, and it seems no one has had the time to fix it or move to a new system, I've moved everything back to Heroku from Render. This is currently costing me $5 a month. We could do this indefinitely (although if we do, we should move the billing to our SymPy NumFOCUS account), but the things discussed here seem to have some advantages so it's probably still worth pursuing them.

asmeurer avatar Jun 19 '23 19:06 asmeurer