pygit2
pygit2 copied to clipboard
Python 3 port uses Unicode to represent byte strings
pygit2, when built for Python 3, treats paths as Unicode and will fail if a path isn't decodable as the filesystem encoding. But Git paths are byte strings, not Unicode strings. This includes refs, so repos with branch names containing non-UTF-8 sequences are completely unusable on most systems:
>>> repo.listall_references()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 14: invalid start byte
pygit2's behaviour under Python 2 is correct; listall_references and other APIs returns byte strings as they are in the Git model. I don't see why the behaviour should differ by Python version, as both types exist in both languages. It seems to me that the default low-level API should return byte strings to match the underlying model and handle all cases, and convenience wrappers which return Unicode strings could be added if people actually want them. As it stands, some perfectly valid Git repos are unusable except on Python 2.