PySvn icon indicating copy to clipboard operation
PySvn copied to clipboard

checkout fails: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 28: ordinal not in range(128)

Open verybadsoldier opened this issue 9 years ago • 3 comments

Hello gentlemen,

when I use this code:

import svn.remote

r = svn.remote.RemoteClient(<myUrl>)
r.checkout('/tmp/working')

Then it fails with this exception:

pydev debugger: starting (pid: 7264)
pydev debugger: warning: trying to add breakpoint to file that does not exist: /usr/local/lib/python2.7/dist-packages/svn-0.3.44-py2.7.egg/svn/remote.py (will have no effect)
Traceback (most recent call last):
  File "/home/robin/.eclipse/org.eclipse.platform_4.6.2_1747617930_linux_gtk_x86/plugins/org.python.pydev_5.1.2.201606231256/pysrc/pydevd.py", line 1530, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/home/robin/.eclipse/org.eclipse.platform_4.6.2_1747617930_linux_gtk_x86/plugins/org.python.pydev_5.1.2.201606231256/pysrc/pydevd.py", line 937, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/robin/subversion/appl/pyRfJobQueue/trunk/svntest.py", line 4, in <module>
    r.checkout('/tmp/working')
  File "/usr/local/lib/python2.7/dist-packages/svn/remote.py", line 20, in checkout
    self.run_command('checkout', cmd)
  File "/usr/local/lib/python2.7/dist-packages/svn/common.py", line 78, in run_command
    return stdout.decode().strip('\n').split('\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 28: ordinal not in range(128)


Using Python 2.7.3 on Ubuntu 12.04 LTS, svn 1.8.8.

I noticed that in common.py the variable LANG is set to 'en_US.UTF-8'. But my svn command line client seems to ignore it and use LANGUAGE instead.

Please see this console output which is supposed to prove it (my language is usually german):

robin@P-CZC3084WSD:/tmp$ LANG=en_US.UTF-8 svn
Geben Sie »svn help« für weitere Hilfe ein.
robin@P-CZC3084WSD:/tmp$ LANGUAGE=en_US.UTF-8 svn
Type 'svn help' for usage.

So setting LANG to 'en_US.UTF-8' does not give me english svn outputs but setting LANGUAGE to 'en_US.UTF-8' does.

When I go to common.py change this line: environment_variables['LANG'] = 'en_US.UTF-8' to environment_variables['LANGUAGE'] = 'en_US.UTF-8' then it works (no more exceptions).

I don't know why my svn uses LANGUAGE while others apparently use LANG. Maybe one solution for python-svn would be to set both LANG and LANGUAGE?

Thank you!

verybadsoldier avatar Mar 09 '17 12:03 verybadsoldier

I got the same problem, and I just wanted to give another workaround that solved it for me.

I just modified the following in common.py, function external_command:

return stdout.decode('utf-8').strip('\n').split('\n')

(Notice the 'utf-8' parameter in the decode function)

I don't really know if this approach could cause other problems, but if not it would be easy to create a PR with this little change and solve the bug.

atorralba avatar Jul 21 '17 15:07 atorralba

Traceback (most recent call last):                                                                                                                                                                                                             
  File "test.py", line 53, in <module>                                                                                                                                                                                                         
    for i in remote.list(extended=False, rel_path='/'):                                                                                                                                                                                        
  File "C:\Code\TD\.venv\lib\site-packages\svn\common.py", line 325, in list                                                                                                                                                                   
    [full_url_or_path]):                                                                                                                                                                                                                       
  File "C:\Code\TD\.venv\lib\site-packages\svn\common.py", line 54, in run_command                                                                                                                                                             
    return self.external_command(cmd, environment=self.__env, **kwargs)                                                                                                                                                                        
  File "C:\Code\TD\.venv\lib\site-packages\svn\common_base.py", line 39, in external_command                                                                                                                                                   
    return stdout.decode().strip('\n').split('\n')                                                                                                                                                                                             
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 6: invalid start byte

Using Python 3.6.3 on WIndows 7 with Chinese, svn 1.9.7, PySvn 0.3.45

I got the same problem on windows too, but both LANGUAGE or LANG not works for me.

return stdout.decode('cp936').strip('\n').split('\n')  # it works

I think the problem is stdout.decode using UTF-8 as default encoding. It not woks for everybody (my Windows is cp936). Maybe the encoding should be a parameter that can be pass to decode the stdout.

s2marine avatar Nov 04 '17 07:11 s2marine

I encountered this issue today. We have a file name in our SVN repo that has non-asci characters in it. When fetching the SVN log, it triggers this error printing the log message.

Another option would be to specify an error directive in the decode call.

The current line is this:

return stdout.decode().strip('\n').split('\n')

Optionally it could be this:

stdout.decode(errors='replace').strip('\n').split('\n')

Or a new argument could be passed into the method that defaults to 'strict' (which is the current default of the decode method) or 'replace' if that is deemed a better choice. My vote would be something like this:

def external_command(self, cmd, success_code=0, do_combine=False, 
    return_binary=False, environment={}, wd=None,
        decode_encoding="utf-8", decode_errors="replace"):
     # Code removed for brevity
    return stdout.decode(encoding=decode_encoding,
        errors=decode_errors).strip('\n').split('\n')

skelker avatar Dec 06 '19 15:12 skelker