python-fire icon indicating copy to clipboard operation
python-fire copied to clipboard

string literals in lists can't start with numbers without extra quotes

Open thomasgilgenast opened this issue 5 years ago • 0 comments

Following the example from the fire guide section on argument parsing, it seems like when passing a list of strings to fire, none of the elements of the string can start with a number, unless two extra pairs of quotes are used (no single pair of quotes works, at least in bash).

$ echo "import fire; fire.Fire(lambda obj: type(obj).__name__)" > test.py
$ python test.py [hi]
list
$ python test.py [4hi]
str
$ python test.py '["4hi"]'
list

This is because ast.parse() (called in _LiteralEval()) chokes on arguments like '[4hi]':

Python 3.7.4 (default, Sep 12 2019, 15:40:15)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from fire.parser import _LiteralEval
>>> _LiteralEval('[hi]')
['hi']
>>> _LiteralEval('[4hi]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/fire/parser.py", line 96, in _LiteralEval
    root = ast.parse(value, mode='eval')
  File "/usr/local/lib/python3.7/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
    [4hi]
       ^
SyntaxError: invalid syntax
>>> _LiteralEval('["4hi"]')
['4hi']

My understanding here is limited, but I think ast.parse() succeeds when the string does not start with a letter because it parses [hi] to a list containing only one element: the variable (not a string) hi. This variable hi in the AST is later replaced with the string literal 'hi' by _Replacement(). Once the variable has been replaced by the string literal in the AST, the AST can be evaluated by ast.literal_eval().

When an element in the list begins with a number (e.g., when the argument to ast.parse() is '[4hi]'), ast.parse() again interprets this as a list containing only one element: the variable 4hi. This time, 4hi is not a valid Python identifier (since variable names cannot start with numbers), so ast.parse() crashes.

This doesn't happen when 4hi is quoted inside the list (e.g., when the argument to ast.parse() is '["4hi"]'), because in that case ast.parse() does not try to interpret the element in the list ("4hi") as a variable - instead it interprets it as a string literal, no intermediate variable 4hi is implied, and this AST can be evaluated by ast.literal_eval() without any replacements:

>>> ast.literal_eval(ast.parse('["4hi"]', mode='eval'))
['4hi']

In other words, the simple "safe eval" workflow of ast.literal_eval(ast.parse(value, mode='eval')) does not normally allow string elements in the list to be passed without quotes. However, _LiteralEval() walks the parsed AST and replaces variables with literals by calling _Replacement(). This allows string elements in the list to be passed without quotes, but it only works as long as those string elements are valid Python identifiers (so that they survive the initial ast.parse()).

As a user reading the guide, I guessed from the examples that elements in lists that were not literals would be automatically quoted for me by fire (the guide says "bare-words are automatically replaced with strings", so maybe "4hi" doesn't count as a bare-word but "hi4" does). Thus I was surprised that ["hi"] and ["4hi"] (represented without quotes as [hi] and [4hi], respectively) could give different results.

In other words, I was expecting something more like

import ast

def unquoted_literal_eval(value):
    # detect list
    if value.startswith('[') and value.endswith(']'):
        # remove brackets
        value = value[1:-1]
        # split on comma, then strip whitespace from each item
        items = map(str.strip, value.split(','))
        # recursion: eval each item in the list
        return [unquoted_literal_eval(item) for item in items]
    # blocks for tuple and dict not implemented but could go here
    # base case (intended for scalar value)
    try:
        # classic "safe eval"
        return ast.literal_eval(ast.parse(value, mode='eval'))
    except (ValueError, SyntaxError):
        # if eval fails, we assume the user forgot to quote a string, so we
        # return the value as it was passed to us
        return value

This probably isn't very robust and it doesn't handle dicts and tuples yet but it should be enough to illustrate the point:

>>> unquoted_literal_eval('[hi,"bye", \'ok\', 4, True, None, 4hi, "4", [-10.0]]')
['hi', 'bye', 'ok', 4, True, None, '4hi', '4', [-10.0]]

Note that this probably can't be achieved by modifying the _Replacement() strategy, since the first step of that strategy is ast.parse() which will not work on a list that contains unquoted string elements.

thomasgilgenast avatar Oct 22 '20 17:10 thomasgilgenast