checkout fails: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 28: ordinal not in range(128)
Hello gentlemen,
when I use this code:
import svn.remote
r = svn.remote.RemoteClient(<myUrl>)
r.checkout('/tmp/working')
Then it fails with this exception:
pydev debugger: starting (pid: 7264)
pydev debugger: warning: trying to add breakpoint to file that does not exist: /usr/local/lib/python2.7/dist-packages/svn-0.3.44-py2.7.egg/svn/remote.py (will have no effect)
Traceback (most recent call last):
File "/home/robin/.eclipse/org.eclipse.platform_4.6.2_1747617930_linux_gtk_x86/plugins/org.python.pydev_5.1.2.201606231256/pysrc/pydevd.py", line 1530, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/robin/.eclipse/org.eclipse.platform_4.6.2_1747617930_linux_gtk_x86/plugins/org.python.pydev_5.1.2.201606231256/pysrc/pydevd.py", line 937, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/robin/subversion/appl/pyRfJobQueue/trunk/svntest.py", line 4, in <module>
r.checkout('/tmp/working')
File "/usr/local/lib/python2.7/dist-packages/svn/remote.py", line 20, in checkout
self.run_command('checkout', cmd)
File "/usr/local/lib/python2.7/dist-packages/svn/common.py", line 78, in run_command
return stdout.decode().strip('\n').split('\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 28: ordinal not in range(128)
Using Python 2.7.3 on Ubuntu 12.04 LTS, svn 1.8.8.
I noticed that in common.py the variable LANG is set to 'en_US.UTF-8'. But my svn command line client seems to ignore it and use LANGUAGE instead.
Please see this console output which is supposed to prove it (my language is usually german):
robin@P-CZC3084WSD:/tmp$ LANG=en_US.UTF-8 svn
Geben Sie »svn help« für weitere Hilfe ein.
robin@P-CZC3084WSD:/tmp$ LANGUAGE=en_US.UTF-8 svn
Type 'svn help' for usage.
So setting LANG to 'en_US.UTF-8' does not give me english svn outputs but setting LANGUAGE to 'en_US.UTF-8' does.
When I go to common.py change this line:
environment_variables['LANG'] = 'en_US.UTF-8'
to
environment_variables['LANGUAGE'] = 'en_US.UTF-8'
then it works (no more exceptions).
I don't know why my svn uses LANGUAGE while others apparently use LANG. Maybe one solution for python-svn would be to set both LANG and LANGUAGE?
Thank you!
I got the same problem, and I just wanted to give another workaround that solved it for me.
I just modified the following in common.py, function external_command:
return stdout.decode('utf-8').strip('\n').split('\n')
(Notice the 'utf-8' parameter in the decode function)
I don't really know if this approach could cause other problems, but if not it would be easy to create a PR with this little change and solve the bug.
Traceback (most recent call last):
File "test.py", line 53, in <module>
for i in remote.list(extended=False, rel_path='/'):
File "C:\Code\TD\.venv\lib\site-packages\svn\common.py", line 325, in list
[full_url_or_path]):
File "C:\Code\TD\.venv\lib\site-packages\svn\common.py", line 54, in run_command
return self.external_command(cmd, environment=self.__env, **kwargs)
File "C:\Code\TD\.venv\lib\site-packages\svn\common_base.py", line 39, in external_command
return stdout.decode().strip('\n').split('\n')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 6: invalid start byte
Using Python 3.6.3 on WIndows 7 with Chinese, svn 1.9.7, PySvn 0.3.45
I got the same problem on windows too, but both LANGUAGE or LANG not works for me.
return stdout.decode('cp936').strip('\n').split('\n') # it works
I think the problem is stdout.decode using UTF-8 as default encoding. It not woks for everybody (my Windows is cp936).
Maybe the encoding should be a parameter that can be pass to decode the stdout.
I encountered this issue today. We have a file name in our SVN repo that has non-asci characters in it. When fetching the SVN log, it triggers this error printing the log message.
Another option would be to specify an error directive in the decode call.
The current line is this:
return stdout.decode().strip('\n').split('\n')
Optionally it could be this:
stdout.decode(errors='replace').strip('\n').split('\n')
Or a new argument could be passed into the method that defaults to 'strict' (which is the current default of the decode method) or 'replace' if that is deemed a better choice. My vote would be something like this:
def external_command(self, cmd, success_code=0, do_combine=False,
return_binary=False, environment={}, wd=None,
decode_encoding="utf-8", decode_errors="replace"):
# Code removed for brevity
return stdout.decode(encoding=decode_encoding,
errors=decode_errors).strip('\n').split('\n')