cylc-flow icon indicating copy to clipboard operation
cylc-flow copied to clipboard

Fix parsec.validate list parser

Open wxtim opened this issue 3 years ago • 1 comments

Investigation

Whilst working on https://github.com/cylc/cylc-flow/issues/5120 @MetRonnie and I noticed a number of untoward things about the functions strip_and_unquote_list and _unquoted_list_parse.

  • Whilst test coverage is 100% I don't think that the tests represent the full range of sane input to these functions.
  • The doctest for strip_and_unquote_list implies that (None, ' 1 , "2", 3')['1', '"2"', '3'] which seems wrong given the name of the function.
  • A number of real world examples such as <foo=1, bar=2> are not tested at all.
  • Some of the existing unit tests assert strange behavior 'a, b, c\n"d"'['a', 'b', 'd']

The functions may well be wrong, but users may depend on them being as they are so it's important to be careful with any changes.

wxtim avatar Sep 27 '22 15:09 wxtim

To add to this list...

Nonsensical difference between these two depending on whitespace:

>>> list(ParsecValidator._unquoted_list_parse(None, '1,"2",3'))
['1', '2', '3']

>>> list(ParsecValidator._unquoted_list_parse(None, '1, "2", 3'))
['1', '"2"', '3']

Unexpected splitting on </> chars if the first item in the list is quoted.

>>> ParsecValidator.strip_and_unquote_list(None, '"1", A<x>,43')
['1', 'A', '<', 'x', '>', '43']

And from #5120

>>> ParsecValidator.strip_and_unquote_list(None, '1, "2, 3", 4')
['1', '"2', '3"', '4']  # should be ['1', '2, 3', '4']

(Use case:)

[runtime]              
   [[I, J]]           
   [[A<x,y>]]         
   [[a1<x,y>, a2<x,y>, a3<x,y>]]
       inherit = I, "A<x,y>", J 

I suggest dropping the shlex branch of ParsecValidator.strip_and_unquote_list() https://github.com/cylc/cylc-flow/blob/f88f32e2bf14568664d27f9452c655baa45eb9b7/cylc/flow/parsec/validate.py#L554-L560

and then go about fixing these issues

MetRonnie avatar Sep 27 '22 15:09 MetRonnie