git-filter-repo icon indicating copy to clipboard operation
git-filter-repo copied to clipboard

Fix "Passed but got" error on CJK file names

Open louy2 opened this issue 2 years ago • 1 comments

filter-repo callback passes unicode filename as utf_8 bytes, but git check-ignore prints unicode filename as quoted octal escaped utf_8 bytes, failing the name != pathname check on CJK filenames. .decode('unicode_escape') decodes latin-1 bytes with escaped unicode, so it decodes the escaped bytes, but into a latin-1 str, therefore .encode('latin_1') recovers the original bytes, which is utf_8, and is comparable to the filename passed by filter-repo callback.

louy2 avatar Mar 14 '23 11:03 louy2

Thanks, but avoiding trouble with parsing special filenames would probably be better done by passing the "-z" option to check-ignore. If we do that, we would also need to split input paths with null characters rather than newline characters, and also split output on null characters rather than newline characters. Do you want to give that a shot?

newren avatar Mar 28 '23 03:03 newren

I implemented the alternative using the -z flag to check-ignore in commit 2800bcc1007e (clean-ignore: support utf-8 filenames found in .gitignore, 2024-07-02)

newren avatar Jul 03 '24 06:07 newren