Only create distinct changesets
Attempting to resolve #1995
Problem
Currently, when multiple artifacts share the same version number pinned by the same val binding, we end up seeing multiple PRs with the exact same changeset
Examples
- guardrail-dev/guardrail#1012
- guardrail-dev/guardrail#1013
- guardrail-dev/guardrail#1014
- guardrail-dev/guardrail#1015
Strategy in this PR
Attempting to deduplicate before we got to applyUpdate wasn't a viable strategy, due to the various different heuristics that could result in different commits.
Instead, track a Ref[F, List[Branch]] of every branch that we have walked in this run, then immediately after applyUpdate but before pushCommits, ensure that there are no branches that we have touched so far in this run that have the same changeset (as determined by the line count of git diff other-branch).
This will result in a little more IO churn, diffing between every other open and curated branch, but it should have a positive impact on users.
If you've got advice on how to write a more effective test for this new functionality, I'm open to it -- it seemed as though NurtureAlgTest would be the right place to put this, but there's not a lot of structure there to base my attempt on.
Thank you
As always, a tremendous thank you for this fantastic tool.
Codecov Report
Merging #2019 (b1c949d) into master (f68c614) will increase coverage by
0.15%. The diff coverage is43.24%.
@@ Coverage Diff @@
## master #2019 +/- ##
==========================================
+ Coverage 78.58% 78.74% +0.15%
==========================================
Files 131 132 +1
Lines 2251 2272 +21
Branches 54 47 -7
==========================================
+ Hits 1769 1789 +20
- Misses 482 483 +1
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...ala/org/scalasteward/core/nurture/NurtureAlg.scala | 4.90% <0.00%> (+0.18%) |
:arrow_up: |
| ...scala/org/scalasteward/core/nurture/ApplyAlg.scala | 52.63% <52.63%> (ø) |
|
| ...la/org/scalasteward/core/application/Context.scala | 86.66% <100.00%> (+0.22%) |
:arrow_up: |
| ...n/scala/org/scalasteward/core/git/FileGitAlg.scala | 98.14% <100.00%> (+0.03%) |
:arrow_up: |
| ...in/scala/org/scalasteward/core/git/GenGitAlg.scala | 92.30% <100.00%> (+0.30%) |
:arrow_up: |
| .../main/scala/org/scalasteward/core/io/FileAlg.scala | 100.00% <100.00%> (+3.12%) |
:arrow_up: |
| .../scala/org/scalasteward/core/io/WorkspaceAlg.scala | 50.00% <0.00%> (+20.00%) |
:arrow_up: |
| .../scala/org/scalasteward/core/data/UpdateData.scala | 100.00% <0.00%> (+50.00%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update f68c614...b1c949d. Read the comment docs.
Codacy failed because I used .get on a Ref[F, A]? Baffling
😅 Gonna think about this one more. If there is something obvious about the project structure I'm missing, I'm all ears, but I'll need to revisit in a few days.
OK, I think codecov/patch is incorrect.
I think the blast radius required to appropriately test this feature ended up being significantly more than initially anticipated, so I humbly submit this for review.
The tests needed to be written against real git running on the real filesystem, which is why I had to largely forego MockContext, but I admit I could have misunderstood something along the way and am happy to make any changes to help increase the overall quality of code in this project while also achieving the new feature I'm attempting to add.
Thanks!
It seems to me that this will only delay subsequent PRs with the same changes to the next run. If Scala Steward creates a PR for an update and that PR is not merged before the next run, Scala Steward will not pass that update to NurtureAlg since there is nothing to do for this update. But the update which has been prevented in the first run is still passed to NurtureAlg so that the previously prevented PR can now be created.
I'm also concerned that there is no principle which PR gets created and which PR is prevented. Currently the first update wins over the others and updates are AFAIK sorted by groupId and artifactId. What if a later update would have resulted in a "more correct" PR?
What if a later update would have resulted in a "more correct" PR?
Only exactly equivalent changesets are deduplicated, using git itself as the final arbiter instead of trying to guess without modifying the filesystem. We accumulate commits, but only gate the push using this approach.
To your first point, I didn't realize updates were filtered before hitting NurtureAlg. Short of more back channels with distinct data flows, I don't know how to proceed here.
There may be an avenue to pursue, just tracking active branches in the store. This would work in the case you describe, being the initial set of branches to compare against, as well as working in the case of rebasing off master.
What do you think about this?