copier icon indicating copy to clipboard operation
copier copied to clipboard

refactor!: introduce new update algorithm based on `git merge`

Open sisp opened this issue 4 months ago • 4 comments

I have introduced a new update algorithm that is centered around the high-level git merge command, performing a 3-way merge of a fresh project based on the new template version into the current project with the common ancestor being a fresh project based on the old template version. This is – in a nutshell – what the previous update algorithm was doing using a sequence of low-level Git commands and other hacks. With this refactoring, I'm hoping to simplify the update algorithm implementation, reduce the risk of missed edge cases by relying on a battle-tested high-level command, and paving the way for new opportunities like improving merge conflict handling through custom merge drivers like https://mergiraf.org and supporting an update algorithm variant that does not rely on replaying a fresh project based on the old template version (see #1170).

I have marked this PR as a draft because this is a significant (and breaking, see below for more details) change that needs to be thoroughly reviewed and shouldn't be rushed.

According to my research, Copier's update algorithm has a long history, starting with #106 and evolving from a 2-way merge to a 3-way merge with merge conflicts stored first in .rej files and later also available as inline markers. Many edge cases have shown up along the way and their fixes contributed to the current complexity. At some point, I realized that the current algorithm is essentially a 3-way merge, implemented with a sequence of low-level commands and hacks. But git merge also performs a 3-way merge with just a single command, given that the Git graph contains three commits with the appropriate relationship. This Git graph does not exist naturally in our case, as a Copier template needs to be instantiated before updating a project, and these template instances are not tracked via Git. But it is possible to construct this relationship in a synthetic Git graph which consists of the three commits necessary to perform the 3-way merge. git merge can be applied on this synthetic graph, and with some tricks it looks like a familiar merge. Thanks to using plain git merge, delicate Git index manipulation to induce a mid-merge state for merge conflicts, especially to enable visual merge conflicts in IDEs like VS Code, becomes obsolete, aborting a Copier update becomes a simple git merge --abort command, and edge cases like add/add merge conflicts are handled without special treatment. In a way, copier copy is like git clone, and copier update is like git merge – but extended to handling parametrized starters with some extra sugar.

This PR introduces a few breaking changes:

  • git merge can only produce inline conflict markers but no .rej files, so I've removed the conflict setting. I suppose that nobody uses .rej files anymore, as inline conflict markers have been the default setting for a while and are familiar to Git users.
  • git merge does not support a configurable context size, so I've removed the context_lines setting.
  • git merge does not support full customization of the conflict maker labels. To compensate for this limitation at least a little bit, a temporary ref copier/after-updating points to the commit of the fresh project based on the new template version, which is used as the conflict marker label for "incoming" changes. The conflict marker label for the current state is HEAD.
  • .gitattributes settings affect Copier's internal git merge call. This may have (unexpected) side effects which are difficult to anticipate. But it may be exactly what we want, glancing at a custom merge driver like https://mergiraf.org again.

If we decide to (eventually) merge this PR, I suggest to slip in deprecation warnings first, release them, and wait for a little while before creating a new major release to give Copier users a smoother transition experience.

WDYT, @copier-org/maintainers?

Additionally looping in some contributors who have been active around the update algorithm: @tpluscode @tguillemot @barrywhart @thurse93

sisp avatar Oct 28 '25 13:10 sisp

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 97.20%. Comparing base (de88305) to head (0bd952f).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2376      +/-   ##
==========================================
- Coverage   97.24%   97.20%   -0.05%     
==========================================
  Files          55       55              
  Lines        6247     6156      -91     
==========================================
- Hits         6075     5984      -91     
  Misses        172      172              
Flag Coverage Δ
unittests 97.20% <100.00%> (-0.05%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 28 '25 13:10 codecov[bot]

Thanks for your prompt review and the most positive feedback, @yajo! :bow: :pray:

I've added BREAKING CHANGE trailers to the commit message as you suggested. I've also found a few internal helpers that are now obsolete and removed them. And I've fixed the failing tests on Windows; it turns out that trailing whitespaces in filenames are stripped on Windows, so the two previously failing parametrizations don't make much sense on Windows and are now skipped there.

sisp avatar Oct 28 '25 20:10 sisp

Sounds great to me! I contributed the PR about 3 years ago which initially added support for inline conflict markers, for the same reasons you mention -- it's the default with git. In fact, I had never seen an .rej file before using copier.

No concerns at all about removing the context setting either -- I've never used it.

Simpler code sounds cool -- git can do so many surprising things if you know how to approach it.

barrywhart avatar Oct 28 '25 22:10 barrywhart

Yep, so cool that we can get rid of the index manipulation hacks :clap: :rocket: Didn't fully check the code yet but it looks much cleaner :slightly_smiling_face:

pawamoy avatar Oct 29 '25 00:10 pawamoy