[BUG] single quote string parse differently in array of inline table
code:
import toml
s = """
[foo]
bar1 = [
{msg = ["1'2"] },
],
[[foo.bar2]]
msg = "1'2"
"""
print(toml.loads(s))
output:
{'foo': {'bar1': [], 'bar2': [{'msg': "1'2"}]}}
Firstly, to reproduce this, the value in the inline table doesn't need to be in an array.
Secondly the bug is in decoder.TomlDecoder.load_array
Thirdly it occurs in all 4 types of Toml string.
Running:
>python toml_bug.py
with toml_bug.py as:
import toml
dec = toml.decoder.TomlDecoder()
print(dec.load_array("""[{msg = "'"}]"""))
print(dec.load_array("""[{msg = '"'}]"""))
print(dec.load_array("""[{msg = '''"'''}]"""))
print(dec.load_array('''[{msg = """'"""}]'''))
print(dec.load_array("""[{msg = "a"}]"""))
Gives:
[]
[]
[]
[]
[{'msg': 'a'}]
Based solely on the fact that I can't see such a test, I think the issue is that there is no test for matching quotation marks to take the decoder out of "string" mode by flipping in_str. As far as I understand the code below, the boolean in_str is always toggled when it hits a quote, even when it's within a pair of the other type of quotes.
while end_group_index < len(a[1:]):
if a[end_group_index] == '"' or a[end_group_index] == "'":
if in_str:
backslash_index = end_group_index - 1
while (backslash_index > -1 and
a[backslash_index] == '\\'):
in_str = not in_str
backslash_index -= 1
in_str = not in_str
https://github.com/uiri/toml/blob/65bab7582ce14c55cdeec2244c65ea23039c9e6f/toml/decoder.py#L960
Parsing toml is now possible with the core Python library tomllib, there are plenty of alternatives without this bug (that also support Tomls >= 1.0.0, not just 0.5.0), and it'll take me more time than it's worth to tinker with that code and ensure all the possible edge cases are avoided, so I'm not going to fix this. But it's probably straightforward for anyone who wants to give it a shot.