toml icon indicating copy to clipboard operation
toml copied to clipboard

Incorrect parsing of multiline strings within lists

Open rdaysky opened this issue 4 years ago • 2 comments

Source:

list = [ 
    "first",
    """multi\
    line""",
    "last",
]

Expected:

{"list": ["first", "multiline", "last"]}

Actual:

{"list": ["first", "mu"]}

rdaysky avatar Sep 15 '21 15:09 rdaysky

The same behavior.

And more examples even for one line. Comma is a problem in the string.

Those are OK.

toml.loads('''a=[' "a","b" ']''')
{'a': [' "a","b" ']}
toml.loads('''a=[""" "a" "b" """]''')
{'a': [' "a" "b" ']}
toml.loads('''a=[""" 'a' 'b' , """]''')
{'a': [" 'a' 'b' , "]}
toml.loads('''a=[""" 'a','b' """]''')
{'a': [" 'a','b' "]}

This is not OK.

toml.loads('''a=[""" "a" "b" , """]''')
{'a': [' "a" ', '']}
toml.loads('''a=[""" "a","b" """]''')
*** toml.decoder.TomlDecodeError: Found tokens after a closed string. Invalid TOML. (line 1 column 1 char 0)
Traceback (most recent call last):
  File "/path/.venv/lib/python3.7/site-packages/toml/decoder.py", line 512, in loads
    multibackslash)
  File "/path/.venv/lib/python3.7/site-packages/toml/decoder.py", line 778, in load_line
    value, vtype = self.load_value(pair[1], strictly_valid)
  File "/path/.venv/lib/python3.7/site-packages/toml/decoder.py", line 880, in load_value
    return (self.load_array(v), "array")
  File "/path/.venv/lib/python3.7/site-packages/toml/decoder.py", line 1026, in load_array
    nval, ntype = self.load_value(a[i])
  File "/path/.venv/lib/python3.7/site-packages/toml/decoder.py", line 849, in load_value
    raise ValueError("Found tokens after a closed " +
ValueError: Found tokens after a closed string. Invalid TOML.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/.venv/lib/python3.7/site-packages/toml/decoder.py", line 514, in loads
    raise TomlDecodeError(str(err), original, pos)

rysson avatar Sep 24 '21 14:09 rysson

Hi, just to add another example:

>>> t1 = "foo = [\n\t'a',\n\t'''\\\n\tb\\\n\t''',\n]"
>>> t2 = "foo = [\n\t'''\\\n\ta\\\n\t''',\n\t'''\\\n\tb\\\n\t''',\n]"
>>> t3 = "foo = [\n\t'''\\\n\ta\\\n\t''',\n\t'b',\n\t'''\\\n\tc\\\n\t'''\n]"
>>> print(t1)  # One-line + multi-line
foo = [
        'a',
        '''\
        b\
        ''',
]
>>> print(t2)  # Multi-line only
foo = [
        '''\
        a\
        ''',
        '''\
        b\
        ''',
]
>>> print(t3)  # Multi-line + mixed
foo = [
        '''\
        a\
        ''',
        'b',
        '''\
        c\
        '''
]
>>> toml.loads(t1)  # Wrong
{'foo': ['a', '']}
>>> toml.loads(t2)  # OK
{'foo': ['a', 'b']}
>>> toml.loads(t3)  # OK
{'foo': ['a', 'b', 'c']}

Seems that string arrays which contain multiline strings but start with one-liners trip the parser.

TTsangSC avatar Apr 29 '24 18:04 TTsangSC