dulwich icon indicating copy to clipboard operation
dulwich copied to clipboard

Unicode file name do not checkout correctly on windows

Open garyvdm opened this issue 10 years ago • 6 comments

Steps to reproduce:

dulwich clone https://github.com/garyvdm/git_unicode_files.git
dir git_unicode_files

expected: 1 file named À (which is u'\u00c0') actual: the file is named À (which is u'\u00c3\u20ac')

the file name is what you get if you do u'\u00c0'.encode('utf8').decode('mbcs'). mbcs it the default filesystem charter encoding used on windows.

The git client handles this correctly. I'll take a look at their source code in the future to try figure out how they handle this.

garyvdm avatar Jun 21 '14 12:06 garyvdm

This is what msysgit does: https://github.com/msysgit/git/commit/19d1e75d58d772329372d453ead964c813bbc6b6

garyvdm avatar May 07 '15 09:05 garyvdm

Has this been resolved?

jelmer avatar May 23 '15 17:05 jelmer

@garyvdm Has this been resolved?

jelmer avatar May 23 '15 17:05 jelmer

No, not yet.

garyvdm avatar May 23 '15 19:05 garyvdm

I also encountered this problem. see https://github.com/FriendCode/gittle/issues/72

UnicodeDecodeError When filename is"article/python2编码问题.md" or has unicode char

dulwich/index.py(423) build_index_from_tree()
-> full_path = os.path.join(prefix, entry.path)
(Pdb) pp prefix
u'E:/work/py/kkblog/article_repo/\u54c8\u54c8\\guyskk\\webhooks_test'
(Pdb) pp entry.path
'article/python2\xe7\xbc\x96\xe7\xa0\x81\xe9\x97\xae\xe9\xa2\x98.md'
(Pdb) os.path.join(prefix, entry.path)
*** UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 16: ordinal not in range(128)

My script:

# coding:utf-8

def pull_or_clone(dest, repo_url):

    from giturlparse import parse
    from gittle import Gittle
    import os
    p = parse(repo_url)
    user_repo_path = os.path.join(dest, p.owner, p.repo)
    if os.path.exists(user_repo_path):
        repo = Gittle(user_repo_path, origin_uri=repo_url)
        repo.pull()
    else:
        repo = Gittle.clone(repo_url, user_repo_path)

if __name__ == '__main__':
    dest = u"E:/work/py/kkblog/article_repo/哈哈"
    repo_url = u"https://github.com/guyskk/webhooks_test.git"
    pull_or_clone(dest, repo_url)

guyskk avatar Oct 04 '15 12:10 guyskk

It would be great if somebody could verify this still happens with Dulwich 0.20.3. The testsuite now passes on Windows, so if it still happens we can probably add a test & fix for it.

jelmer avatar Jun 21 '20 02:06 jelmer