[BUG] single quote string parse differently in array of inline table

Open laazy opened this issue 1 year ago • 1 comments

code:

import toml
s = """
[foo]
bar1 = [
    {msg = ["1'2"] },
],
[[foo.bar2]]
msg = "1'2"
"""

print(toml.loads(s))

output:

{'foo': {'bar1': [], 'bar2': [{'msg': "1'2"}]}}

Apr 10 '24 03:04 laazy

Firstly, to reproduce this, the value in the inline table doesn't need to be in an array.

Secondly the bug is in decoder.TomlDecoder.load_array

Thirdly it occurs in all 4 types of Toml string.

Running:

>python toml_bug.py

with toml_bug.py as:

import toml

dec = toml.decoder.TomlDecoder()
print(dec.load_array("""[{msg = "'"}]"""))
print(dec.load_array("""[{msg = '"'}]"""))
print(dec.load_array("""[{msg = '''"'''}]"""))
print(dec.load_array('''[{msg = """'"""}]'''))
print(dec.load_array("""[{msg = "a"}]"""))

Gives:

[]
[]
[]
[]
[{'msg': 'a'}]

Based solely on the fact that I can't see such a test, I think the issue is that there is no test for matching quotation marks to take the decoder out of "string" mode by flipping in_str. As far as I understand the code below, the boolean in_str is always toggled when it hits a quote, even when it's within a pair of the other type of quotes.

                while end_group_index < len(a[1:]):
                    if a[end_group_index] == '"' or a[end_group_index] == "'":
                        if in_str:
                            backslash_index = end_group_index - 1
                            while (backslash_index > -1 and
                                   a[backslash_index] == '\\'):
                                in_str = not in_str
                                backslash_index -= 1
                        in_str = not in_str

https://github.com/uiri/toml/blob/65bab7582ce14c55cdeec2244c65ea23039c9e6f/toml/decoder.py#L960

Parsing toml is now possible with the core Python library tomllib, there are plenty of alternatives without this bug (that also support Tomls >= 1.0.0, not just 0.5.0), and it'll take me more time than it's worth to tinker with that code and ensure all the possible edge cases are avoided, so I'm not going to fix this. But it's probably straightforward for anyone who wants to give it a shot.

Apr 10 '24 16:04 JamesParrott