scala-steward icon indicating copy to clipboard operation
scala-steward copied to clipboard

Only create distinct changesets

Open blast-hardcheese opened this issue 4 years ago • 6 comments

Attempting to resolve #1995

Problem

Currently, when multiple artifacts share the same version number pinned by the same val binding, we end up seeing multiple PRs with the exact same changeset

Examples

  • guardrail-dev/guardrail#1012
  • guardrail-dev/guardrail#1013
  • guardrail-dev/guardrail#1014
  • guardrail-dev/guardrail#1015

Strategy in this PR

Attempting to deduplicate before we got to applyUpdate wasn't a viable strategy, due to the various different heuristics that could result in different commits.

Instead, track a Ref[F, List[Branch]] of every branch that we have walked in this run, then immediately after applyUpdate but before pushCommits, ensure that there are no branches that we have touched so far in this run that have the same changeset (as determined by the line count of git diff other-branch).

This will result in a little more IO churn, diffing between every other open and curated branch, but it should have a positive impact on users.

If you've got advice on how to write a more effective test for this new functionality, I'm open to it -- it seemed as though NurtureAlgTest would be the right place to put this, but there's not a lot of structure there to base my attempt on.

Thank you

As always, a tremendous thank you for this fantastic tool.

blast-hardcheese avatar Mar 20 '21 07:03 blast-hardcheese

Codecov Report

Merging #2019 (b1c949d) into master (f68c614) will increase coverage by 0.15%. The diff coverage is 43.24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2019      +/-   ##
==========================================
+ Coverage   78.58%   78.74%   +0.15%     
==========================================
  Files         131      132       +1     
  Lines        2251     2272      +21     
  Branches       54       47       -7     
==========================================
+ Hits         1769     1789      +20     
- Misses        482      483       +1     
Impacted Files Coverage Δ
...ala/org/scalasteward/core/nurture/NurtureAlg.scala 4.90% <0.00%> (+0.18%) :arrow_up:
...scala/org/scalasteward/core/nurture/ApplyAlg.scala 52.63% <52.63%> (ø)
...la/org/scalasteward/core/application/Context.scala 86.66% <100.00%> (+0.22%) :arrow_up:
...n/scala/org/scalasteward/core/git/FileGitAlg.scala 98.14% <100.00%> (+0.03%) :arrow_up:
...in/scala/org/scalasteward/core/git/GenGitAlg.scala 92.30% <100.00%> (+0.30%) :arrow_up:
.../main/scala/org/scalasteward/core/io/FileAlg.scala 100.00% <100.00%> (+3.12%) :arrow_up:
.../scala/org/scalasteward/core/io/WorkspaceAlg.scala 50.00% <0.00%> (+20.00%) :arrow_up:
.../scala/org/scalasteward/core/data/UpdateData.scala 100.00% <0.00%> (+50.00%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f68c614...b1c949d. Read the comment docs.

codecov[bot] avatar Mar 20 '21 07:03 codecov[bot]

Codacy failed because I used .get on a Ref[F, A]? Baffling

blast-hardcheese avatar Mar 20 '21 16:03 blast-hardcheese

😅 Gonna think about this one more. If there is something obvious about the project structure I'm missing, I'm all ears, but I'll need to revisit in a few days.

blast-hardcheese avatar Mar 20 '21 18:03 blast-hardcheese

OK, I think codecov/patch is incorrect.

I think the blast radius required to appropriately test this feature ended up being significantly more than initially anticipated, so I humbly submit this for review.

The tests needed to be written against real git running on the real filesystem, which is why I had to largely forego MockContext, but I admit I could have misunderstood something along the way and am happy to make any changes to help increase the overall quality of code in this project while also achieving the new feature I'm attempting to add.

Thanks!

blast-hardcheese avatar Mar 29 '21 18:03 blast-hardcheese

It seems to me that this will only delay subsequent PRs with the same changes to the next run. If Scala Steward creates a PR for an update and that PR is not merged before the next run, Scala Steward will not pass that update to NurtureAlg since there is nothing to do for this update. But the update which has been prevented in the first run is still passed to NurtureAlg so that the previously prevented PR can now be created.

I'm also concerned that there is no principle which PR gets created and which PR is prevented. Currently the first update wins over the others and updates are AFAIK sorted by groupId and artifactId. What if a later update would have resulted in a "more correct" PR?

fthomas avatar Apr 10 '21 15:04 fthomas

What if a later update would have resulted in a "more correct" PR?

Only exactly equivalent changesets are deduplicated, using git itself as the final arbiter instead of trying to guess without modifying the filesystem. We accumulate commits, but only gate the push using this approach.

To your first point, I didn't realize updates were filtered before hitting NurtureAlg. Short of more back channels with distinct data flows, I don't know how to proceed here.

There may be an avenue to pursue, just tracking active branches in the store. This would work in the case you describe, being the initial set of branches to compare against, as well as working in the case of rebasing off master.

What do you think about this?

blast-hardcheese avatar Apr 13 '21 15:04 blast-hardcheese