git-filter-repo icon indicating copy to clipboard operation
git-filter-repo copied to clipboard

git-filter-repo results in loss of files

Open cschuber opened this issue 3 years ago • 7 comments

I'm trying to separate FreeBSD's ftpd into its own repo. git clone https://git.freebsd.org/src.git ftpd cd ftpd git filter-repo --subdirectory-filter libexec/ftpd This removes everything except libexec/ftpd. However two files ftpchroot.5 and another are completely removed. The original ftpchroot.5 was removed years ago and later a new ftpchroot.5 was added from a different branch (vendor/NetBSD/...). git-filter-repo failed to add the newer ftpchroot.5.

cschuber avatar Jun 17 '21 00:06 cschuber

git-filter-branch worked using a freshly cloned repo:

git filter-branch --subdirectory-filter libexec/ftpd filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch filename' --prune-empty -f -- --all rm -r .git/refs/original git reflog expire --expire=now --all git gc --prune=now git gc --aggressive

cschuber avatar Jun 17 '21 00:06 cschuber

Another datapoint: it did work correctly with telnet. git clone a fresh copy of src. git filter-repo --path contrib/telnet --path lib/libtelnet --path libexec/telnetd --path usr.bin/telnet git reflog expire --all --expire-unreachable=0 git repack -A -d git prune

I documented everything here: https://wiki.freebsd.org/git-filter

cschuber avatar Jun 17 '21 03:06 cschuber

It's not that it filtered incorrectly, it's that it exits with an error before completing:

Parsed 363989 commits
New history written in 260.90 seconds; now repacking/cleaning...
Failed to find new id for b'refs/tags/vendor/openzfs/2.0-rc3-gfc5966' (old id was b'7fe0c023de4c5c027c92bb47967c0eb3ae9551b1')

Note here that in the original repository before filtering:

$ git log --format=%n --name-only refs/tags/vendor/openzfs/2.0-rc3-gfc5966 | grep -c libexec/ftpd
0

So, there is a tag which does not have any part of libexec/ftpd in its history, so the tag should be rewritten to nothing. The code is expecting it to be rewritten to something, though and dies. Investigating...

newren avatar Jun 17 '21 04:06 newren

Where did my comment go last night to ignore the above comment because I was using a dirty version of git? Anyway, I reproduced last night but need to find some time to debug.

newren avatar Jun 17 '21 21:06 newren

I'm not 100% sure I duplicated this bug, but I observed something rather similar. I think I came closer to identifying the actual issue though. This can be observed with the following:

mkdir tmp; cd tmp
git init
git remote add origin https://git.freebsd.org/src.git
git fetch origin 126780243806a6b6cf345b9b548f68010e921aec
git branch test FETCH_HEAD
git filter-repo --path sys/xen/interface/ --path-rename sys/xen/interface/:xen/include/public/

I hope you're keeping a local copy of the repository since that is rather //large// to repeatedly clone. The advantage here is 126780243806a6b6cf345b9b548f68010e921aec omits much of the history so issues can be tracked better.

Notably at 126780243806a6b6cf345b9b548f68010e921aec sys/xen/interface/COPYING is present, which should turn into xen/include/public/COPYING. Alas it instead gets lost.

There appear to be 8 commits of interest:

  1. 30d1eefe393712ac86e4ac5edd279a7856a48ed2
  2. a972cc523a669c55740f4fd63f16e2ec1a69d83e
  3. 920ba15bf993776c4fbe5808f94e2384a52c56f2
  4. 8d1f54dbaa5ef5168b896f6562c06e9da6b23881
  5. d7f03759d5b1ecfad6b123e6b34d91c81d918a25
  6. fc84e5f31b647de417569f1fa3f3a490e5c4c158
  7. 174eb94d2b193ecd995c3023b6a2e8103f1d0e68
  8. 126780243806a6b6cf345b9b548f68010e921aec

Of these, 5-6 are suspicious since the file was absent. It looks like at 4 a portion of the repository was split off into a separate project, but at 7 it rejoined the main project. This sub-project deleted the file, but when it rejoined it was merged without deleting the file.

Likely 7 was dropped by filter-repo due to the merge ending up degenerate (one branch contributed nothing), but filter-repo failed to drop 5-6 which were the dropped branch.

ehem avatar Jan 25 '22 22:01 ehem

I'm confused. Is this still a bug? Should I just manually write out all the --path & path-rename arguments in command line?

Erutan409 avatar May 31 '22 14:05 Erutan409

Interested to know about any developments on this as well.

gab avatar Oct 18 '22 21:10 gab