go-git icon indicating copy to clipboard operation
go-git copied to clipboard

Detect renamed files in DiffTree

Open vmarkovtsev opened this issue 7 years ago • 3 comments

Right now go-git does not support the detection of renamed files in the tree comparison algorithm.

We want to implement this support under the following conditions:

  1. Works identically to CGit
  2. Post-processing step in DiffTree

The detection algorithm is not complex by itself. It depends on the similarity threshold M (as listed in CGit commands). M is between 0 and 100 and is the percent of same bytes in two files to consider them as a rename.

  1. Sort changed files by size.
  2. Apply a sliding window, size depends on M so that the difference between the center and the edge corresponds to (100 - M).
  3. Compare all original files to all new files within that window. Calculate the number of same lines in the diff and the same bytes in the changed lines, measure the ratio.
  4. Take into account the same file basenames.
  5. Greedily record rename pairs.

So the complexity is close to linear. We could go deeper and solve a linear assignment problem for better accuracy: several files can be detected as renames of a single origin and the greedy decision is not always the best.

This algorithm is partially implemented in Hercules.

Link to the algorithm in CGit.

Link to the algorithm in JGit. - it does not produce identical results to CGit.

JGit's implementation is much cleaner and overall looks better designed. So I would suggest to use it as the baseline for tests.

@mcuadros @smola

vmarkovtsev avatar Aug 29 '18 09:08 vmarkovtsev

I a need to choose I will choose cgit, since is what the people expect. But ... are jgit result better than the cgit? I not or simply you can measure, I rather choose to make cgit implementation.

mcuadros avatar Sep 04 '18 21:09 mcuadros

I am afraid there is no benchmark suite for this but I can do one.

vmarkovtsev avatar Sep 05 '18 06:09 vmarkovtsev

Is there any update planned on this issue? @vmarkovtsev

fahadhk avatar Nov 26 '19 10:11 fahadhk