repo2docker icon indicating copy to clipboard operation
repo2docker copied to clipboard

bot for refreezing

Open minrk opened this issue 6 years ago • 19 comments

Proposed change

Every once in a while, we want to rebuild the base environments. This is done by running the freeze.py script in the conda buildpack.

Alternative options

keep manually refreezing

Who would use this feature?

Maintainers of repo2docker and folks contributing updates to the base environments who don't want to run refreeze themselves.

One way is to do periodic refreeze, to keep things up to date, as is done with henchbot. Another is to allow pull requests to update the environment spec or other files and then post a comment like @r2d-bot please refreeze.

How much effort will adding it take?

Not sure! Hopefully not too much. Requires writing a bot that talks to the GitHub API. It would need to:

  1. respond to a trigger somehow (e.g. a @refreeze-bot please refreeze comment to update a PR or a new please refreeze Issue to trigger a new PR)
  2. make a commit in an existing pull request and/or open a new pull request

Who can do this work?

  • Interest in / knowledge of the GitHub API
  • Interest in / knowledge of writing bots and deploying them somewhere (on heroku, etc.)

References:

  • henchbot which does pull requests for updates to repo2docker and binderhub for mybinder.org
  • conda-forge-admin bot which does things like add commits to pull requests to run common commands like @conda-forge-admin please rerender

minrk avatar Jul 25 '19 07:07 minrk

I'd be happy to take this up, I've been wanting to get more into repo2docker but hadn't found a good entry point yet.

I don't see a "refreeze.py" just "freeze py" in the conda and legancy buildpacks. Am I missing something?

Also, does this depend on builds existing outside of the state of the GitHub repo? One of the reasons henchbot just goes every hour is because the images maybe hadn't made it to Dockerhub yet, so it checks there specifically.

henchc avatar Jul 25 '19 09:07 henchc

I think it is a typo. freeze.py is what we use to (re)freeze the environments.

I think it uses a miniconda docker image so we shouldn't have the problem where we need to wait for an image to build.

betatim avatar Jul 25 '19 11:07 betatim

Am I missing something?

@betatim is right. I updated the description: We only want to run freeze.py in the conda buildpack. This will regenerate the eight environment.pyX.Y[.frozen].yml files.

does this depend on builds existing outside of the state of the GitHub repo?

Depends, but mostly no in that there's nothing outside to check. There's no outside event to "check if things are up-to-date", since it's all the packages. If we're doing periodic checks, we should maybe perform a diff of the resulting yaml files (the parsed data, not the bytes on disk), though, because the files will change with the Generated on <DATE> comment, even if the content doesn't change.

Unlike mybinder.org, we don't want this to be constantly updating (a PR for every tiny change would be overkill). Rather, prompting to keep it up-to-date every one, two, or even four weeks.

More useful, I think, will be the bot adding the freeze commit to an existing PR, e.g. one that bumps a dependency in environment.yml, letting the bot do the freezing step. See the two commits in https://github.com/jupyter/repo2docker/pull/354. For the workflow I have in mind, see this PR on conda-forge where there's one commit by hand changing the input, then a second commit from a bot to regenerate derivative files.

minrk avatar Jul 30 '19 05:07 minrk

Thanks @minrk ! I will start some work on it this week.

henchc avatar Aug 12 '19 00:08 henchc

So I've been working on this when I get the chance. The bot can run the freeze and create the PR, I'm still working on the PR text.

My next step question will be if anyone knows how we can run this in an environment that has the required Docker? Can Heroku do this? @betatim ?

henchc avatar Sep 08 '19 20:09 henchc

I don't think on heroku you can access a docker daemon :-/

betatim avatar Sep 10 '19 05:09 betatim

Is this still open? If so I would love to give it a go with some Github actions stuff

trallard avatar Jun 08 '20 13:06 trallard

I think it is!

betatim avatar Jun 08 '20 15:06 betatim

Cool, so I got a decent start on this - been trying on a repo of my own as I am testing with the PRs and comments.

As I am working this is the workflow I have come up with based on @minrk description/references:

  1. Someone submits a PR to repo2docker (change environment.yaml for the conda buildpack)
  2. Someone uses trigger phrase in the PR comments to refreeze based on the changes above (only works if the path repo2docker/buildpacks/conda has committed changes (😂 currently using @bot winter-is-coming) )
  3. Bot commits the newly generated files (or rather updated from the freeze) on top of the PR

I do have some questions:

  • For step 2 do you want to allow anyone to be able to run the command?
  • Also, do you want the bot to add a comment with the outcome of the freeze or something similar?
  • Do you have any preference regarding the format or message for the bot?

trallard avatar Jun 10 '20 10:06 trallard

For step 2 do you want to allow anyone to be able to run the command?

Can it be anyone with write access to the PR branch (i.e. author + repo maintainers)?

Also, do you want the bot to add a comment with the outcome of the freeze or something similar?

Maybe only on fail with error messages? At the start, at least, a confirmation message that it has received the request and begun the process might be good, but I suspect we won't want this message once it's running reliably.

Do you have any preference regarding the format or message for the bot?

How about @bot please refreeze?

minrk avatar Jun 16 '20 10:06 minrk

In https://github.com/jupyterhub/repo2docker/pull/958 (WIP) I've got an update.py script that automatically updates one or more dependencies in environment.yml to the latest on conda-forge. I've been playing around with GitHub workflows to try and automate the freeze.py (mostly for my personal curiosity), but if it's useful feel free to take the script.

manics avatar Sep 18 '20 16:09 manics

Right, I was working on this - and got then forgotten as I went on annual leave and been swamped with JupyterCon stuff.

Anyway, I have a working MVP (had to create a separate repo to enable actions though). Example here. The one piece missing is pushing the refreezed yaml files to the PR head instead of master. Which I can do on Friday as it's my meeting-free day.

Would it be then suitable for adopting @manicsupdate.py for the dependencies update?

trallard avatar Sep 22 '20 16:09 trallard

Since your bot is almost ready let's keep it as it is, and after it's merged I'll look at adding my script to it.

manics avatar Sep 24 '20 17:09 manics

righto - will finish this and then we can see how to integrate both 😄

trallard avatar Sep 25 '20 17:09 trallard

@trallard How are you getting on? I just discovered this GitHub Action: https://github.com/marketplace/actions/create-pull-request If it works as described you just need to generate the updated files and it should automatically take care of opening the PR

manics avatar Nov 02 '20 14:11 manics

Sorry folks I was swamped with JupyterCon tasks and then with work.

I was having troubles finding the correct reference for the PR and to allow gh actions to add a commit to the PR itself. And could for my life not find something in the docs or the GH actions developer forum. I finally found a way to go around this and should have some time this week to finish off and send the finalised PR. I am so sorry for taking this long but my workload has been a nightmare this year

trallard avatar Nov 02 '20 16:11 trallard

No need to apologise! I only commented because I just discovered that action, and thought it might be useful if you hadn't reached that stage 😀.

manics avatar Nov 02 '20 19:11 manics

For anyone who is looking at picking this up, https://peterevans.dev/posts/github-actions-how-to-automate-code-formatting-in-pull-requests/ seems to show pushing git changes after a run. I think the default token has push access, but it's a little unclear if it would have access to push to a PR from itself. Seems like the answer is probably. But if it's a cron action, it should have push access.

I was thinking maybe a comment-trigger might be useful, too (e.g. @github-actions please refreeze).

minrk avatar Feb 16 '21 12:02 minrk

I don't think the token has push access to a PR unless the PR is from a branch in the same fork, which in general isn't the case. Your suggestion of doing it with cron should be good enough, though you may run into the problem of GitHub workflows not being triggered by a push from the default token which means either manually closing/reopening a refreeze PR to force the CI run, or using a secret token instead of the default.

manics avatar Feb 16 '21 20:02 manics