python-fire All the list is convert into string if '-' is present

Hello,

Capture d’écran 2020-03-02 à 09 56 52

I reuse the introduction example code to illustrate. When you give a list as arg, and into this list there is a string including this specific charactere '-', Fire turns all the list into one and unique string 😢. As you can see with the capture.

I'm looking into the fire code to see where the modification could be.

Thanks !

Mar 02 '20 09:03 outscale-frl

Thanks for reporting the issue.

Here's the reason for it:

Your shell is stripping the quotes from the input to fire. So all Fire sees is [hel-lo,hello] in the first example and [hello,hello] in the second example (no quotes). Fire is smart enough to put the quotes back in around bare words, so it gives a list in the second example, but in the first example this yields ["hel"-"lo","hello"] which isn't a valid input, so Fire falls back to treating the whole thing as a string.

As a workaround you can put quotes around the whole input: python main.py --name="['hel-lo','hello']" but I know that's not a great experience.

The place in the code where an improvement would be made is here https://github.com/google/python-fire/blob/b77f5aefbf5986391ab6cb9f51376c6d93d757cf/fire/parser.py#L57

Also #226 (strict mode) will help in this situation once it's available (but its not implemented yet) because you'll be able to specify whether an input expects a string or a list and Fire wont surprise you with a different type.

Mar 02 '20 15:03 dbieber

Thanks for your answer @dbieber !

Mar 02 '20 16:03 outscale-frl

I tried a modification but there are so many usecases...

if --name=['hello-eei38223','342efw3'] root = ast.parse(value, mode='eval') return a SyntaxError because the second element starts with a number.

if --name=['hello-eei38223','hello'] root = ast.parse(value, mode='eval') return a root.body.elts with BinOp and Name object

if --name=['hello-38223'] root = ast.parse(value, mode='eval') return root.body.elts[0].left as Name object and root.body.elts[0].right as Num object

if --name=['hello-eei38223'] root = ast.parse(value, mode='eval') return a root.body.elts[0].left and right as Name object

So maybe wait for the strict mode is a better solution than modification into _LiteralEval ?

Mar 02 '20 18:03 outscale-frl

I think we'll want both. We definitely want to improve this. And while strict mode will remove surprises, we also should separately come up with an intuitive and unambiguous way to pass in lists (balanced with providing backwards compatibility where reasonable to do so.)

Mar 02 '20 18:03 dbieber

Hello! We are a group of undergrads from LNMIIT, Jaipur from India and looking for a contribution to open source as a course project under Prof. Philip Miller . We are highly interested to work on this issue. Can we expect your guidance and support and proceed further?

Aug 28 '20 10:08 dhawalmalot08

@dbieber I believe that we have a working solution for this. As it stands, the current implementation relies heavily on the AST library to identify individual elements in containers and put quotes around them when necessary:

https://github.com/google/python-fire/blob/b77f5aefbf5986391ab6cb9f51376c6d93d757cf/fire/parser.py#L100-L109

What we try to do instead, is perform this identification of individual elements manually, by considering some general delimiters (take for example, when in a list, a comma (,) when not in a string, generally denotes a demarcation between two elements). When the identified element is further identified as a String (we still rely on AST for this purpose), we put back the stripped quotes around it. This takes care of all special characters, including hyphens, that occur in arguments.

Our current implementation passes parser_test.py easily, and with the exception of 1 type of case (which needs some discussion), parser_fuzz_test.py as well. I can send over a flowchart and pseudocode so that you can get a clearer picture of what I’m talking about (we made these as part of our uni project).

Do let us know what you think of this solution!

Dec 22 '20 10:12 stephenvincent27

Thanks @stephenvincent27!

I'd love to hear more. This is definitely an area we want to improve in Fire.

Dec 22 '20 15:12 dbieber

@dbieber Thanks for your prompt reply!

Here’s a rough flowchart for our implementation: Drive Link

To summarize: We parse the argument by dividing it into one of the three major cases:

The argument is a container. In that case, ignore the braces, and perform the check again. In the case of dictionaries, we start considering the colon (:) as a delimiter as well. To know whether we are currently inside a dictionary, we use a stack.
The argument is a string with quotes intact (different shells strip quotes differently, we found, hence this case.) In this case, we directly continue parsing from the closing quotes. Our implementation covers both single and triple quoted strings.
The argument is a literal (or a string with quotes stripped). In that case, we move to the first delimiter that we find after it (this denotes the end of the element). Now, we pass this ‘literal’ element to AST, and only when AST identifies it as a string, we add quotes around it (numerals, booleans are thus safe).

Our general delimiter set consists of a comma (,), a space (‘ ’) and all possible closing braces (‘)’, ‘}’, ‘]’). Adding space to the delimiter set ensures that variable spaces added by the user are taken care of.

Now, as I had mentioned earlier, we fail test cases of a specific type in fuzz tests, one example of which is:

INPUT: K,\x1d OUTPUT: ('K', '\x1d') EXPECTED(?): 'K,\x1d'

Here, because of the comma in between, our implementation treats it as two different elements, and when passed to AST, it converts it into a tuple. So many additions leads to a high edit distance between the input and output strings, hence the failure. Again, this can be easily avoided by checking whether we are currently in a container or not. However, we were unsure about which behavior is more appropriate. Do let us know.

There are two other test cases where failure is declared (again, due to high edit distances), but I think they can be ignored:

‘...’ gets converted to Ellipsis when passed to AST.
‘1e5’ gets converted to 100000.0.

Dec 24 '20 17:12 stephenvincent27

This is looking good! I agree we can ignore most of those failures of the fuzz test (not sure about what we should do for '...'; see end). We should update the fuzz test so those don't count as failures any more.

I think our next step should be to list out new test cases so that we can some concrete examples on which to evaluate your approach. This will make discussing potential changes easier, give us test cases to prevent regressions, and will also help with describing parsing in the documentation once the change is released.

We should be sure our examples cover nested lists, tuples, sets, and dicts, strings, bools, ints, floats, complex numbers, quoted and unquoted strings with and without non-alphanumeric characters. Here are two proposed tests; feel free to suggest more:

[1234,test-item,[nested-list!, with spaces, 1e5]] -> [1234, "test-item", ["nested-list!", "with spaces", 1e5]] {12.34,test-item,(2+3j, with spaces , True)} -> {12.34, "test-item", (2+3j, "with spaces", True)}

What do you think we should do with ...? Do you think we should treat it as Ellipsis or a string? I think a string might be better as it would be surprising to many users for it to be treated as Ellipsis.

Dec 24 '20 19:12 dbieber

I do have some cases myself, I think we can add them to parser_test.py. I’ll suggest more when it comes to me:

[{abc:3+4j,'colon:here':('spaces, and commas', 123.45),1e5:{'''abc''':True}}] -> [{'abc': (3+4j), 'colon:here': ('spaces, and commas', 123.45), 100000.0: {'abc': True}}] {'''[all (delimiters)], are present {in} this:string!'''} -> {'[all (delimiters)], are present {in} this:string!'}

Other than that, yes! We can update fuzz tests to accommodate our implementation. I had no idea how that formula for maximum permissible edit distance came to be in the first place, so kindly guide me on how to change it accordingly.

Our implementation does cover every data type that you mentioned, along with other remaining data types in the Python documentation as well (as long as AST can parse them correctly).

Regarding ‘...’: I honestly didn’t know that such an object existed until I came across one in tests; however I do agree that converting ‘...’ to ‘Ellipsis’ would be unexpected by users. We’ll treat it as a string, all it would take is an extra condition in the code.

Dec 26 '20 07:12 stephenvincent27

If all seems good, should we go for the PR? We can further discuss things there.

Jan 08 '21 07:01 stephenvincent27

Yes, that sounds good.

Jan 08 '21 17:01 dbieber

python-fire python-fire copied to clipboard

All the list is convert into string if '-' is present

python-fire
python-fire copied to clipboard