git-filter-repo
git-filter-repo copied to clipboard
git-filter-repo results in loss of files
I'm trying to separate FreeBSD's ftpd into its own repo. git clone https://git.freebsd.org/src.git ftpd cd ftpd git filter-repo --subdirectory-filter libexec/ftpd This removes everything except libexec/ftpd. However two files ftpchroot.5 and another are completely removed. The original ftpchroot.5 was removed years ago and later a new ftpchroot.5 was added from a different branch (vendor/NetBSD/...). git-filter-repo failed to add the newer ftpchroot.5.
git-filter-branch worked using a freshly cloned repo:
git filter-branch --subdirectory-filter libexec/ftpd filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch filename' --prune-empty -f -- --all rm -r .git/refs/original git reflog expire --expire=now --all git gc --prune=now git gc --aggressive
Another datapoint: it did work correctly with telnet. git clone a fresh copy of src. git filter-repo --path contrib/telnet --path lib/libtelnet --path libexec/telnetd --path usr.bin/telnet git reflog expire --all --expire-unreachable=0 git repack -A -d git prune
I documented everything here: https://wiki.freebsd.org/git-filter
It's not that it filtered incorrectly, it's that it exits with an error before completing:
Parsed 363989 commits
New history written in 260.90 seconds; now repacking/cleaning...
Failed to find new id for b'refs/tags/vendor/openzfs/2.0-rc3-gfc5966' (old id was b'7fe0c023de4c5c027c92bb47967c0eb3ae9551b1')
Note here that in the original repository before filtering:
$ git log --format=%n --name-only refs/tags/vendor/openzfs/2.0-rc3-gfc5966 | grep -c libexec/ftpd
0
So, there is a tag which does not have any part of libexec/ftpd in its history, so the tag should be rewritten to nothing. The code is expecting it to be rewritten to something, though and dies. Investigating...
Where did my comment go last night to ignore the above comment because I was using a dirty version of git? Anyway, I reproduced last night but need to find some time to debug.
I'm not 100% sure I duplicated this bug, but I observed something rather similar. I think I came closer to identifying the actual issue though. This can be observed with the following:
mkdir tmp; cd tmp
git init
git remote add origin https://git.freebsd.org/src.git
git fetch origin 126780243806a6b6cf345b9b548f68010e921aec
git branch test FETCH_HEAD
git filter-repo --path sys/xen/interface/ --path-rename sys/xen/interface/:xen/include/public/
I hope you're keeping a local copy of the repository since that is rather //large// to repeatedly clone. The advantage here is 126780243806a6b6cf345b9b548f68010e921aec
omits much of the history so issues can be tracked better.
Notably at 126780243806a6b6cf345b9b548f68010e921aec
sys/xen/interface/COPYING
is present, which should turn into xen/include/public/COPYING
. Alas it instead gets lost.
There appear to be 8 commits of interest:
-
30d1eefe393712ac86e4ac5edd279a7856a48ed2
-
a972cc523a669c55740f4fd63f16e2ec1a69d83e
-
920ba15bf993776c4fbe5808f94e2384a52c56f2
-
8d1f54dbaa5ef5168b896f6562c06e9da6b23881
-
d7f03759d5b1ecfad6b123e6b34d91c81d918a25
-
fc84e5f31b647de417569f1fa3f3a490e5c4c158
-
174eb94d2b193ecd995c3023b6a2e8103f1d0e68
-
126780243806a6b6cf345b9b548f68010e921aec
Of these, 5-6 are suspicious since the file was absent. It looks like at 4 a portion of the repository was split off into a separate project, but at 7 it rejoined the main project. This sub-project deleted the file, but when it rejoined it was merged without deleting the file.
Likely 7 was dropped by filter-repo
due to the merge ending up degenerate (one branch contributed nothing), but filter-repo
failed to drop 5-6 which were the dropped branch.
I'm confused. Is this still a bug? Should I just manually write out all the --path
& path-rename
arguments in command line?
Interested to know about any developments on this as well.