dulwich
dulwich copied to clipboard
Bug: Incorrect git config parsing (error when cloning local repository with trailing backslash on Windows)
I ran into a ValueError ("escape character followed by unknown character 102...") when trying to clone a local repository on Windows. The issue turned out to be because of how trailing backslashes interact with how dulwich parses git config files.
Reproduction steps
(I think this affects lots of other operations on things that have been cloned from a local remote on Windows, but I'll show a way to reproduce by cloning, which was how I first encountered the issue.)
Environment: Windows 10, with output of pip list
being
Package Version
---------- ---------
certifi 2022.9.24
dulwich 0.20.50
pip 21.1.1
setuptools 56.0.0
urllib3 1.26.12
- Create a new git repository that we're going to clone. In Powershell:
mkdir C:\source_repo
cd C:\source_repo
git init
git commit --allow-empty -m "Initial commit"
- Try to clone it using dulwich:
python -c "from dulwich import porcelain; porcelain.clone('C:\\source_repo\\', 'C:\\cloned_repo')"
(note the trailing backslash in the path for source_repo
-- without it it works fine).
Expected result
The dulwich.porcelain.clone
command works and clones the repository.
Actual result
We get an exception:
Traceback (most recent call last):
File "C:\Users\hliu\venvs\test_dulwich\lib\site-packages\dulwich\config.py", line 413, in _parse_string
v = _ESCAPE_TABLE[value[i]]
KeyError: 102
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\porcelain.py", line 551, in clone
return client.clone(
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\client.py", line 761, in clone
_import_remote_refs(
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\refs.py", line 1211, in _import_remote_refs
refs_container.import_refs(
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\refs.py", line 187, in import_refs
self.set_if_equals(
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\refs.py", line 871, in set_if_equals
self._log(
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\refs.py", line 128, in _log
self._logger(ref, old_sha, new_sha, committer, timestamp, timezone, message)
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\repo.py", line 1207, in _write_reflog
config = self.get_config_stack()
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\repo.py", line 713, in get_config_stack
backends = [self.get_config()] + StackedConfig.default_backends()
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\repo.py", line 1602, in get_config
return ConfigFile.from_path(path)
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\config.py", line 593, in from_path
ret = cls.from_file(f)
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\config.py", line 583, in from_file
value = _parse_string(continuation)
File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\config.py", line 419, in _parse_string
raise ValueError(
ValueError: escape character followed by unknown character 102 at 16 in bytearray(b'C:\\\\source_repo\\fetch = +refs/heads/*:refs/remotes/origin/*')
Analysis
I believe this is a result of incorrect parsing of the git config file by dulwich.
When you clone the repository like this, the contents of C:\cloned_repo\.git\config
will contain a section like:
[remote "origin"]
url = C:\\source_repo\\
fetch = +refs/heads/*:refs/remotes/origin/*
dulwich (incorrectly) treats the final backslash in url = C:\\source_repo\\
as a line continuation character, so joins the two lines, and then gives an error when trying to parse url = C:\\source_repo\\fetch ...
The code to do this is here, where we simply treat every line ending in backslash as continuing onto the next line. This is incorrect in when the actual value on the line ends in an escape sequence \\
.
Compare to how the git code parses config files here, which correctly parses the line one character at a time, and ends up treating the final backslash in url = C:\\source_repo\\
as "consumed" as part of an escape sequence \\
.