scala-steward icon indicating copy to clipboard operation
scala-steward copied to clipboard

message to have scala-steward rebase on HEAD

Open johnynek opened this issue 4 years ago • 26 comments

it would be great if you could have a rebase feature. Our CI is stalled on travis, but the main branch now has github as the CI... dependabot will respond to a rebase command. That would be a great feature here too!

johnynek avatar Nov 22 '20 19:11 johnynek

Scala Steward does not (yet) react to messages but there is the config option updatePullRequests = "always" which keeps the PR always in sync with the base branch via merges. Here is a config using this option.

fthomas avatar Nov 22 '20 19:11 fthomas

The Apache Daffodil project has a policy that no merge commits are allowed in our project's history and pull requests should have only one commit before they are merged. We would like Scala Steward to keep its automatic PRs in sync with the base branch via git push --force-with-lease, not via merge commits, since this breaks our policy of only one commit per pull request and no merge commits in history. Could a config option be added to specify that automatic PRs should be updated this way (git fetch origin and git rebase origin/master or simply recreate the same change on top of the latest HEAD), not via merges?

tuxji avatar Mar 16 '21 20:03 tuxji

@tuxji GitHub has a squash merge option. Does that not work?

johnynek avatar Mar 16 '21 20:03 johnynek

Apache Daffodil requires PRs to use GitHub's "rebase and merge" policy, not its "squash and merge" or "create a merge commit" policies. I suppose Daffodil could use "squash and merge" but the whole reason for our "one commit per PR" and "rebase and merge" policy is to ensure our project history has logical and readable git commit messages. GitHub's "squash and merge" policy would concatenate several commits' messages together, resulting in an ugly looking commit message saying

Update jackson-core to 2.12.2
  
Merge branch 'master' into update/jackson-core-2.12.2
  
Update jackson-core to 2.12.2

instead of a single line saying Update jackson-core to 2.12.2. We'd prefer that Scala Steward recreate and repush its automatic PR with only a single commit and single line message that can be merged without any conflicts.

tuxji avatar Mar 16 '21 20:03 tuxji

Sorry for bringing up an old issue.

But this is something I'd also like to see. I can implement it w/o issue if there is an agreement before hand that it's something the maintainer want.

I see two ways of implementing such a feature:

  • as @johnynek suggested, react to a comment that says rebase - dependabot does this
  • update the description of the PR to include a checkbox to ask for a rebase (user has to check the box, and scala-steward will see it on the next pass) - renovate does this

I prefer rebase as it clears the history. I don't like when my merge request grow to be 3-4 merge commits and only one actual relevant commit. Only to change one line.


As a workaround I have a small script to extract the commit scala-steward makes as a patch, reset --hard to the upstream/$default_branch, and then apply the path with -C 0 to disable surrounding line checks. This works 90% of the time but can yield bad PR.

Fixing upstream by reapplying editAlg process seems like a much better solution.

daddykotex avatar Jan 11 '22 19:01 daddykotex

react to a comment that says rebase - dependabot does this

This sounds nice to have, but I assume it may require Scala Steward to be a long-running server app like said in https://github.com/scala-steward-org/scala-steward/issues/2390#issuecomment-992796681 #1877 Maybe webhooks or some mecahnims to trigger Scala Steward on the exact pull request ? 🤔

exoego avatar Feb 09 '22 03:02 exoego

For commands we don't need a server app. We could check for new comments on each invocation.

mzuehlke avatar Feb 09 '22 09:02 mzuehlke

Can we have a green light that this is something worth implementing from your POV @fthomas ?

daddykotex avatar Feb 17 '22 19:02 daddykotex

Can we have a green light that this is something worth implementing from your POV @fthomas ?

In principle I'm fine with changing Scala Steward so that it can react to comments in its pull requests. What is unclear to me is how we can implement it efficiently so that it doesn't do a bunch of API calls per repository if no new comments have been added to the update PRs.

Currently Scala Steward maintains a RepoCache per repository it keeps up-to-date. This cache contains a list of dependencies and the SHA1 of the latest commit in the repository when it extracted that dependencies. When Scala Steward runs and works an a repository it first compares the cached SHA1 with the latest SHA1 of the repository. Only when they differ, it clones it and extract the dependencies again. We do this because extracting dependencies is a costly operation that requires cloning the repository and calling the build tool(s).

If we start checking comments for commands Scala Steward should react to, I guess we would look at every open PR for new comments after the RepoCache has been checked and possibly refreshed. My concern here is that if there are n open update PRs in a repository we need to do at least n API calls to check if there are new comments because there is no single indicator that tells us if there are any new comments since the last time we checked. So if nothing has been changed in a repository and no new comments have been added, Scala Steward would still need to do 1 + n API calls instead of only 1 as it is now for checking the latest SHA1.

I fear that the public Scala Steward instance would hit the GitHub API rate limit in no time if we start doing this.

fthomas avatar Feb 17 '23 20:02 fthomas

I'm a little confused about which part is expensive. If it's the part about responding to comments, could this be implemented using a configuration option instead? I'd love for a PR like Dwolla/async-utils#115 (which I can't merge yet, but am keeping open until we have the time to make it mergable) to only have 1 or 2 commits in the PR, instead of 187 (and counting 😂).

bpholt avatar Jun 08 '23 21:06 bpholt

I'm a little confused about which part is expensive.

Checking every open update PR for new comments.

It would be ideal if we could determine with just one API call (regardless of the number of open PRs) if the total number of comments in a project has changed. If this indicates that a new comment was added, we can then proceed with the expensive part of checking every open update PR for new comments Scala Steward should react to.

fthomas avatar Jun 09 '23 05:06 fthomas

That makes sense. In the meantime, are you open to implementing a rebase-and-force-push workflow behind a configuration option? (i.e. operators or repos could opt-in to replacing the current revert-and-reapply behavior with rebase-and-force-push instead)

I'm happy to open a separate issue to discuss that if it makes more sense, since it's admittedly not what the OP asked for.

bpholt avatar Jun 09 '23 05:06 bpholt

I'm happy to open a separate issue to discuss that if it makes more sense, since it's admittedly not what the OP asked for.

Sounds good to me.

I suspect that it would be a bit more complicated than rebase-and-force-push. More like recreate-update-branch-and-reapply-and-force-push, because we should ensure that Scala Steward's changes are always done of the tip of the base branch. That is the reason we currently do revert-and-merge-and-reapply.

fthomas avatar Jun 09 '23 07:06 fthomas

I'd be happy to give this a shot. IMO, the configuration option is the best option for two reasons:

  1. it's allows users to choose where or not they'd allow scala-steward to rewrite history on it's PR
  2. it is the simplest implementation. as you've outline already, interacting with the bot over comments is much more complex, costly but on top of that, the implementation will differ for every VCS we support (bitbucket, gitea, etc)

My understanding is this:

  1. when going over an update, if a PR is already open, scala-steward check if the base branch has been updated
  2. if it has, it fetches it, then replay the update on the new tip of the base branch (implemented entirely in git commands)

My only questions is: what do we do if the rebase fails? It could happen if the dependency is removed, or a line right next to it has changed and we can't re-apply safely, and possibly other reasons too.

Do we:

  1. close the PR (it's possible that this update never show again because usually scala-steward treats closed PR as a signal to forget that update)
  2. re-do the whole update dance to generate a new commit, if that fails we close, otherwise we push a branch new commit (diff possibly different)

daddykotex avatar Jun 09 '23 12:06 daddykotex

I will chime in. I'm not the original poster, but I would be fine with a configuration option updatePullRequests = "always-rebase" since Scala Steward does not react to messages at this time.

If the rebase fails, @daddykotex suggested extracting the differences as a patch, git reset --hard to the upstream/$default_branch, and then applying the patch with -C 0 to disable surrounding line checks. He said this works 90% of the time but can yield a bad PR. He also suggested that Scala Stewart re-do the whole update dance (reapply the editAlg process) to create a new branch with a new commit.

I would caution that in the Apache Daffodil project, our contributors have occasionally needed to edit the PR branch to make additional changes before merging the PR (maybe a call is now deprecated and needs to be changed, etc.). The first contributor has to wait for a second contributor to approve the edited PR branch before it gets merged, so suppose enough time (2 days or more) has passed for Scala Stewart to notice that a different PR has been merged, rebase the original PR's branch, get a rebase error, and have to create a new commit from scratch. We wouldn't want the changes the first contributor made to the PR branch to be lost without any chance to reapply them a second time, so we would want Scala Stewart to create a new branch, not force update the old branch. You can append something to the end of the old branch name to get a new branch name or something like that. The only time creating a new branch from scratch with the editAlg process should happen is if the automatic rebase fails, so Scala Stewart shouldn't create many more branches over time than it used to do before.

tuxji avatar Jun 09 '23 14:06 tuxji

My understanding is this:

  1. when going over an update, if a PR is already open, scala-steward check if the base branch has been updated

Correct: https://github.com/scala-steward-org/scala-steward/blob/v0.24.0/modules/core/src/main/scala/org/scalasteward/core/nurture/NurtureAlg.scala#L277

  1. if it has, it fetches it, then replay the update on the new tip of the base branch (implemented entirely in git commands)

Correct: https://github.com/scala-steward-org/scala-steward/blob/v0.24.0/modules/core/src/main/scala/org/scalasteward/core/nurture/NurtureAlg.scala#L296

My only questions is: what do we do if the rebase fails?

IMO this should not happen.

It could happen if the dependency is removed,

Scala Steward won't try to update that dependency if it has been removed from the build.

or a line right next to it has changed and we can't re-apply safely, and possibly other reasons too.

This is why we currently do the revert-and-apply-again dance. IMO the rebase option should also do something similar to a revert-and-apply to ensure the update is done of the tip of the base branch (but it does not need to be recorded in the Git history). I could imagine that it works like this:

  1. delete the old update branch
  2. create a new update branch with the same name
  3. apply the update
  4. force push the new branch (which overwrites the old update branch in the remote repository)

It would be an alternative to the current mergeAndApplyAgain with the same signature.

fthomas avatar Jun 26 '23 06:06 fthomas

If we are going to be updating the PR, we should not delete the old branch (if you delete the branch the PR will be closed), so we should just do a hard reset to the default branch and then do a force push.

alejandrohdezma avatar Jun 26 '23 06:06 alejandrohdezma

Apart from that this sounds like an awesome idea, specially when using the grouping feature, the current state of art creates PRs with hundreds of commits, if they're not merged immediately.

alejandrohdezma avatar Jun 26 '23 07:06 alejandrohdezma

Apart from that this sounds like an awesome idea

I agree. Maybe this should then be the default method of updating PRs. I guess most people do not mind if Scala Steward force-pushes to its update PRs.

fthomas avatar Jun 26 '23 07:06 fthomas

If Scala Stewart force-pushes to its update PRs, please consider using git push --force-with-lease instead of git push --force in order to avoid losing any manual updates made by a human contributor. See my earlier comment:

I would caution that in the Apache Daffodil project, our contributors have occasionally needed to update the PR branch manually to make additional changes before merging the PR (maybe a call is now deprecated and needs to be replaced, etc.). The first contributor has to wait a few hours or days for a second contributor to approve the edited PR branch before merging it, so suppose enough time has passed for Scala Stewart to notice that another PR has updated the main branch, rebase the original PR's branch, get a rebase error, and have to create a new commit. We wouldn't want the manual changes the first contributor made to the PR branch to be lost without any chance to reapply them a second time, so we would want Scala Stewart to carry those changes forward or abort the force-push (git push --force-with-lease) and delegate updating the PR to the contributor in the case when the contributor has updated the original PR branch manually.

tuxji avatar Jun 26 '23 21:06 tuxji

If Scala Stewart force-pushes to its update PRs, please consider using git push --force-with-lease instead of git push --force in order to avoid losing any manual updates made by a human contributor.

Scala Steward won't update a PR that has commits by other authors: https://github.com/scala-steward-org/scala-steward/blob/v0.24.0/modules/core/src/main/scala/org/scalasteward/core/nurture/NurtureAlg.scala#L281-L283

fthomas avatar Jun 27 '23 05:06 fthomas

As Frank said, from the perspective of Scala Steward, it would not make sense to rebase, but to just redo the update. The simplest way to implement is probably by checking out the branch, git reset --hard to the base branch, do the usual update steps, and then force-push. So the config option should not have rebase in its name.

Currently we have

updatePullRequests = "always" | "on-conflicts" | "never"

I don't think this should be a new value for that key. The current setting answers "when" not "how." Those are two orthogonal things.

I think it should be a new key. Perhaps in theory "when" and "strategy" (however named) would be subkeys of something that organizes them together, since they seem related. But I'm not sure how that fits in with existing naming and grouping of settings. And of course that would require dealing with backwards compatibility etc.

As for the OP's request of a general rebase command, that's something more suited to, and indeed supported by, Mergify.

nafg avatar Jul 30 '23 05:07 nafg

We could also deprecate the current updatePullRequests responses (or just leave them as fallbacks) and replace them here to support this:

updatePullRequests = "update-always" | "update-on-conflicts" | "rebase-always" | "rebase-on-conflicts" | "ignore"

alejandrohdezma avatar Jul 30 '23 05:07 alejandrohdezma

We could also deprecate the current updatePullRequests responses (or just leave them as fallbacks) and replace them here to support this:

updatePullRequests = "update-always" | "update-on-conflicts" | "rebase-always" | "rebase-on-conflicts" | "ignore"

What happens if more strategies are added later?

And again, there's no rebasing.

Scala Steward runs on a repo in one pass. It collects all the possible dependency updates and if there is no PR it creates one, but this is about what happens if there is already a PR for an update, and there are separate questions that can be asked.

  1. Under what conditions should a PR be updated? This is the existing setting. It balances the need to be up to date with the base branch with the noise level that can create. So the conditions are either "new commits on the base branch," "conflicting commits on the base branch," or "never."
  2. What action should Scala Steward take to bring the PR up to date with the base branch? The current strategy is to merge the base branch in. I don't know why but it first does a revert. The proposed strategy IIUC is to reset the branch with only fresh commits. This is not a rebase because the old Scala Steward commits are not cherry picked, but discarded and generated again.

I can think of a lot more than 2 strategies.

  1. The current strategy, which I'm very confused by. It first pushes a revert, then a merge, then regenerates the update.
  2. The proposed strategy, which is what Renovate does. It makes the branch be exactly as if the PR had first been created now. It starts from the base branch's tip and only contains the Scala Steward update commits.
  3. A quieter merge strategy. No revert. The merge is done with strategy flags to ensure conflicts are always resolved to the base branch's version. Then the update steps are done again, and only committed if they generated new changes.
  4. A variant of the previous one, except that the updates are amended into the merge commit. This way there is only a single merge commit, that after the fact is guaranteed to resolve the conflict correctly -- everything follows the base branch except for the changes of the update which are preserved.

I can also think of another orthogonal setting to configure PR updates: whether to reuse the PR for new versions of the dependency. So, it opens a PR updating lib X to 2.0 before it's merged 2.1 is released, Scala Steward could update the same branch / PR.

Admittedly, these settings aren't 100% orthogonal. If updatePullRequest=never then the other settings are moot. And I'm not sure if reusing a PR for new versions makes sense with a Merge strategy.

Also maybe updating a branch with new versions is a separate topic from updating it with the base branch, with its own When and Strategy settings.

nafg avatar Jul 30 '23 06:07 nafg

IMO we could just replace the current revert-and-apply strategy with the new reset-and-apply-and-force-push strategy. I don't think anyone would miss the current strategy and we wouldn't have to change the possible updatePullRequests values.

fthomas avatar Aug 01 '23 17:08 fthomas

Further discussion about changing the method how update branches are updated (which started with https://github.com/scala-steward-org/scala-steward/issues/1761#issuecomment-1584014610) should happen in https://github.com/scala-steward-org/scala-steward/issues/3218.

fthomas avatar Nov 22 '23 13:11 fthomas