docopt
docopt copied to clipboard
unicode in argv has wrong output for ellipses
It looks like docopt is concatenating separate arguments when sys.argv is a list of unicode values. I expect '<args>': [u'arg1', u'arg2'], but instead get '<args>': u'arg1arg2'.
$ cat docoptutf.py
'''UTF argv test
Usage:
a <command> [<args> ...]'''
import sys
from pprint import pprint
from docopt import docopt
sys.argv = [u'a', u'mycommand', u'arg1', u'arg2']
pprint(docopt(__doc__))
sys.argv = ['a', 'mycommand', 'arg1', 'arg2']
pprint(docopt(__doc__))
(exo)danw@localhost:~/prj/exosite/exoline/test [master]
$ python docoptutf.py
{'<args>': u'arg1arg2',
'<command>': u'mycommand'}
{'<args>': ['arg1', 'arg2'],
'<command>': 'mycommand'}
$ pip freeze | grep docopt
docopt==0.6.1
$ python --version
Python 2.7.5
This is resolved by https://github.com/docopt/docopt/pull/220
In what circumstances can sys.argv contain unicode in Python 2? I'm using docopt with both Python 2 and Python 3 and never experienced that problem.
I encountered it while writing tests for a command line app where I was constructing args rather than using sys.argv.
@keleshev it seems like several others have seen this issue when testing CLIs using docopt. Would you consider accepting #220 to resolve it? I created a fork (https://pypi.python.org/pypi/docopt-unicode/0.6.1) but I'd like to start using docopt directly again.
I will merge it if you can explain the issue.
In what circumstances can sys.argv contain unicode in Python 2?
I haven't seen sys.argv contain unicode from a shell, but I do get unicode in two use cases:
- My integration tests create sys.argv directly, and may contain unicode
- I'm using docopt in the web service that backs a browser-based CLI, and sys.argv is created directly there too
I think phrasing the problem in terms of sys.argv is misleading; like you said, it will always (I think?) contain bytestrings. The problem is with the second argument to docopt.docopt, called argv:
argv is an optional argument vector; by default docopt uses the argument vector passed to your program (sys.argv[1:]). Alternatively you can supply a list of strings like ['--verbose', '-o', 'hai.txt'].
Based on these docs it seems reasonable to expect argv to accept both encoded and decoded strings.
My use case (https://github.com/venmo/slouch) fits in dweaver's second category.
I have a simple command line for dealing with my movie collection. One of the commands is removing a movie, and for that you give the name of the movie, which of course can contain unicode:
python nephele.py clear "Kizzu ritân"
@keleshev — I'm hitting this issue too.
On top of the unit test argument given above (which is a valid argument), there's also the real use case where the user wants unicode internally to not deal with decoding issues later (i.e. the shell encoding here.) — which is a best practice after all.
In that last use case, this boils down to:
encoding_name = locale.getpreferredencoding()
argv = [arg.decode(encoding_name) for arg in sys.argv] # I want unicode
opts = docopt.docopt(doc=__doc__, argv=argv[1:]) # FAIL
Don't get me wrong, I'm using docopt for all my CLI dev and I'm a huge fan of it, but this is a real issue. I would really appreciate proper unicode support for it.
Diff fixing this particular issue:
130c130
< increment = ([match.value] if type(match.value) is str
---
> increment = ([match.value] if isinstance(match.value, basestring)