docopt unicode in argv has wrong output for ellipses

It looks like docopt is concatenating separate arguments when sys.argv is a list of unicode values. I expect '<args>': [u'arg1', u'arg2'], but instead get '<args>': u'arg1arg2'.

$ cat docoptutf.py
'''UTF argv test

Usage:
  a <command> [<args> ...]'''

import sys
from pprint import pprint
from docopt import docopt

sys.argv = [u'a', u'mycommand', u'arg1', u'arg2']
pprint(docopt(__doc__))
sys.argv = ['a', 'mycommand', 'arg1', 'arg2']
pprint(docopt(__doc__))

(exo)danw@localhost:~/prj/exosite/exoline/test [master]
$ python docoptutf.py
{'<args>': u'arg1arg2',
 '<command>': u'mycommand'}
{'<args>': ['arg1', 'arg2'],
 '<command>': 'mycommand'}
$ pip freeze | grep docopt
docopt==0.6.1
$ python --version
Python 2.7.5

Sep 18 '14 17:09 dweaver

This is resolved by https://github.com/docopt/docopt/pull/220

Sep 19 '14 12:09 dweaver

In what circumstances can sys.argv contain unicode in Python 2? I'm using docopt with both Python 2 and Python 3 and never experienced that problem.

Jan 16 '15 22:01 keleshev

I encountered it while writing tests for a command line app where I was constructing args rather than using sys.argv.

Jan 16 '15 23:01 dweaver

@keleshev it seems like several others have seen this issue when testing CLIs using docopt. Would you consider accepting #220 to resolve it? I created a fork (https://pypi.python.org/pypi/docopt-unicode/0.6.1) but I'd like to start using docopt directly again.

Aug 30 '15 15:08 dweaver

I will merge it if you can explain the issue.

In what circumstances can sys.argv contain unicode in Python 2?

Aug 31 '15 12:08 keleshev

I haven't seen sys.argv contain unicode from a shell, but I do get unicode in two use cases:

My integration tests create sys.argv directly, and may contain unicode
I'm using docopt in the web service that backs a browser-based CLI, and sys.argv is created directly there too

Sep 05 '15 02:09 dweaver

I think phrasing the problem in terms of sys.argv is misleading; like you said, it will always (I think?) contain bytestrings. The problem is with the second argument to docopt.docopt, called argv:

argv is an optional argument vector; by default docopt uses the argument vector passed to your program (sys.argv[1:]). Alternatively you can supply a list of strings like ['--verbose', '-o', 'hai.txt'].

Based on these docs it seems reasonable to expect argv to accept both encoded and decoded strings.

My use case (https://github.com/venmo/slouch) fits in dweaver's second category.

Sep 05 '15 02:09 simon-weber

I have a simple command line for dealing with my movie collection. One of the commands is removing a movie, and for that you give the name of the movie, which of course can contain unicode:

python nephele.py clear "Kizzu ritân"

Sep 24 '15 21:09 EmilStenstrom

@keleshev — I'm hitting this issue too.

On top of the unit test argument given above (which is a valid argument), there's also the real use case where the user wants unicode internally to not deal with decoding issues later (i.e. the shell encoding here.) — which is a best practice after all.

In that last use case, this boils down to:

encoding_name = locale.getpreferredencoding()
argv = [arg.decode(encoding_name) for arg in sys.argv]  # I want unicode
opts = docopt.docopt(doc=__doc__, argv=argv[1:])  # FAIL

Don't get me wrong, I'm using docopt for all my CLI dev and I'm a huge fan of it, but this is a real issue. I would really appreciate proper unicode support for it.

Diff fixing this particular issue:

130c130
<                 increment = ([match.value] if type(match.value) is str
---
>                 increment = ([match.value] if isinstance(match.value, basestring)

Mar 03 '18 20:03 fclaerho

docopt docopt copied to clipboard

unicode in argv has wrong output for ellipses

docopt
docopt copied to clipboard