Mystery solved! A fix for the infamous 5 year old \x bug, that is driving users away.

Open JamesParrott opened this issue 2 years ago • 1 comments

The test function test_bug_148 in /tests/test_api.py was incorrectly coded five years ago [^1]. This has prevented /uiri/toml's automated testing and CI from showing that the original fix for bug 148 was also incorrect (a notoriously difficult to understand block of code in encoder._dump_str) .

Guys, it's about time we put that right!

Over the last 5 years, numerous frustrated users have raised multiple separate issues to no avail [^3], which unbeknownst to them, related to this unfixed bug, which this flawed test enabled to persist undetected. Furthermore, a couple of projects [^4] have migrated to tomli-w. And the core Python developers decided against /uiri/toml, to read TOML natively in Python 3.11.

[^1] This gist of mine contains two test files that prove this, test_bug_148_original.py and test_bug_148_fixed.py

Firstly, in test_bug_148_original.py, the original test function has been abstracted into a factory returning a function that simply applies that same test to any module containing a dumps function. It also contains a second factory, that uses exactly the same string literals (and dictionaries) as in the original test_bug_148, but applies toml.loads to the LHSs instead of toml.dumps to the RHSs of the three equations. [^5]

test_bug_148_original.py enables a fair comparison between toml, tomli-w, tomlkit and toml_tools using the original test function test_bug_148. Running test_bug_148_original.py with pytest in a venv (with all the deps installed), one test passes, but all 7 other tests fail (8 in >= Python 3.11). toml passes its own test as would be expected. But not only do Python 3.11's native tomllib, its precursor tomli, tomli-w, tomlkit, and toml_tools (and my own fork of toml) all fail it, toml.loads fails the inverse version of this same test, that only toml.dumps passes.

I assert test_bug_148 was incorrectly coded and hope you agree. But how should it be fixed, or what should replace it?

Inside the second file in the gist, test_bug_148_correctly.py the code is much the same, but contains what I believe to be the correct string literals, to achieve what /uiri originally intended to test (that a Python string literal r'\x' is encoded in a TOML quoted string literal r'\x's, even when (and only when) an even number of backslashes precedes it in its repr string). This not only allows a fair comparison between toml, tomli/tomllib, tomlkit and toml_tools as before, but between toml.dumps and toml.loads.

Running this we find that 7 tests pass (8 in >= Python 3.11), and only toml.dumps fails its test function from the new factory (for encoders).

Who doesn't love the colourised output from pytest?:

Same results in Ubuntu 22.04 WSL, Python 3.10:

I believe this was an honest mistake on the part of /uiri . Considering how they chose to write the original bug fix by analysing strings almost manually, instead of using a regular expression, and how they used normal string literals instead of r-string literals containing half the number of \s in test_bug_148, perhaps their test function was a victim of the backslash plague.

I will submit PRs shortly that correct these issues, one containing test functions from the factories in test_bug_148_correctly.py and one containing the fix I wrote for my fork (toml_tools -I have my own specific requirements (Iron Python) that mean I do need my own fork of toml).

[^3] These issues include: https://github.com/uiri/toml/issues/411 https://github.com/uiri/toml/issues/404 https://github.com/uiri/toml/issues/261 https://github.com/uiri/toml/issues/201 and more

[^4] Projects that have migrated from toml to tomli-w et al: https://github.com/davidfokkema/tailor/commit/ece266a1b4823b5fc73598e76f12e52b239b041a https://github.com/gramineproject/gramine/commit/56194869d07e1594a46811e8c7cd953615114300 https://github.com/remarshal-project/remarshal/issues/19#issuecomment-692597407 https://github.com/dictation-toolbox/Caster/commit/0ba75e188cbf4e6782a761e2528618c53189b4a8

[^5] Coding note/ apology. I could've defined the string literals as constants, to guaranteed the second factory uses the same values as the first. However I wanted to demonstrate to anyone reading that it contains exactly the same code as in /toml/tests/tests/test_api.py:test_bug_148

FYI, in your own code, using pytest.mark.parametrize would achieve the same goal more concisely. But similarly, I wanted to be transparent, and to produce understandable test results in the command line output via the test functions' names.

Apr 26 '23 20:04 JamesParrott

Code from the gists for inspection, so you don't have to download random untrusted .py files: test_bug_148_original.py

r"""  Commands to reproduce:
python -m venv test_bug_148
Posix: source test_bug_148/bin.activate.sh
Windows: .\test_bug_148\Scripts\activate.bat
pip install pytest toml tomli tomli-w tomlkit toml_tools
pytest --tb=no -rA -q test_bug_148_original.py

Expected on Windows 10 Python 3.12:
F.FFFFFFF                                                                                                                                                          [100%]
======================================================================== short test summary info ======================================================================== 
PASSED test_bug_148_original.py::test_bug_148_toml
FAILED test_bug_148_original.py::test_inverse_bug_148_toml - AssertionError: assert {'a': 'd'} == {'a': '\\x64'}
FAILED test_bug_148_original.py::test_inverse_bug_148_tomllib - AssertionError: assert {'a': 'd'} == {'a': '\\x64'}
FAILED test_bug_148_original.py::test_inverse_bug_148_tomli - AssertionError: assert {'a': 'd'} == {'a': '\\x64'}
FAILED test_bug_148_original.py::test_inverse_bug_148_tomlkit - AssertionError: assert {'a': 'd'} == {'a': '\\x64'}
FAILED test_bug_148_original.py::test_inverse_bug_148_toml_tools - AssertionError: assert {'a': 'd'} == {'a': '\\x64'}
FAILED test_bug_148_original.py::test_bug_148_tomli_w - assert 'a = "\\u0064"\n' == 'a = "\\\\x64"\n'
FAILED test_bug_148_original.py::test_bug_148_tomlkit - assert 'a = "\\u0064"\n' == 'a = "\\\\x64"\n'
FAILED test_bug_148_original.py::test_bug_148_toml_tools - assert 'a = "\\u0064"\n' == 'a = "\\\\x64"\n'
8 failed, 1 passed in 0.11s


"""

import sys
if sys.version_info >= (3,11):
    import tomllib


import tomli
import tomli_w
import tomlkit
import toml
import toml_tools


def make_bug_148_test_function(toml_module):
    def test_bug_148():
        assert 'a = "\\u0064"\n' == toml_module.dumps({'a': '\\x64'})
        assert 'a = "\\\\x64"\n' == toml_module.dumps({'a': '\\\\x64'})
        assert 'a = "\\\\\\u0064"\n' == toml_module.dumps({'a': '\\\\\\x64'})
    return test_bug_148



def make_inverse_bug_148_test_function(toml_module):
    def test_bug_148_inverse():
        assert toml_module.loads('a = "\\u0064"\n') == {'a': '\\x64'}
        assert toml_module.loads('a = "\\\\x64"\n') == {'a': '\\\\x64'}
        assert toml_module.loads('a = "\\\\\\u0064"\n') == {'a': '\\\\\\x64'}
    return test_bug_148_inverse



test_inverse_bug_148_toml = make_inverse_bug_148_test_function(toml)
test_bug_148_toml = make_bug_148_test_function(toml)

# native in Python 3.11
try:
    test_inverse_bug_148_tomllib = make_inverse_bug_148_test_function(tomllib)
except NameError:
    pass

test_inverse_bug_148_tomli = make_inverse_bug_148_test_function(tomli)
test_inverse_bug_148_tomlkit = make_inverse_bug_148_test_function(tomlkit)
test_inverse_bug_148_toml_tools = make_inverse_bug_148_test_function(toml_tools)

test_bug_148_tomli_w = make_bug_148_test_function(tomli_w)
test_bug_148_tomlkit = make_bug_148_test_function(tomlkit)
test_bug_148_toml_tools = make_bug_148_test_function(toml_tools)

test_bug_148_fixed.py

r"""  Commands to reproduce:
python -m venv test_bug_148
Posix: source test_bug_148/bin.activate.sh
Windows: .\test_bug_148\Scripts\activate.bat
pip install pytest toml tomli tomli-w tomlkit toml_tools
pytest --tb=no -rA -q test_bug_148_fixed.py

Expected on Windows 10 Python 3.12:
.F.......                                                                                                                                                          [100%]
======================================================================== short test summary info ========================================================================
PASSED test_bug_148_correctly.py::test_inverse_bug_148_toml
PASSED test_bug_148_correctly.py::test_inverse_bug_148_tomllib
PASSED test_bug_148_correctly.py::test_inverse_bug_148_tomli
PASSED test_bug_148_correctly.py::test_inverse_bug_148_tomlkit
PASSED test_bug_148_correctly.py::test_inverse_bug_148_toml_tools
PASSED test_bug_148_correctly.py::test_bug_148_tomli_w
PASSED test_bug_148_correctly.py::test_bug_148_tomlkit
PASSED test_bug_148_correctly.py::test_bug_148_toml_tools
FAILED test_bug_148_correctly.py::test_bug_148_toml - assert 'a = "\\\\x64"\n' == 'a = "\\u0064"\n'
1 failed, 8 passed in 0.13s
"""


import sys
if sys.version_info >= (3,11):
    import tomllib


import tomli
import tomli_w
import tomlkit
import toml
import toml_tools


def make_correct_bug_148_test_function(toml_module):
    def test_bug_148():
        assert r'a = "\\x64"' + '\n' == toml_module.dumps({'a': r'\x64'})
        assert r'a = "\\\\x64"' + '\n' == toml_module.dumps({'a': r'\\x64'})
        assert r'a = "\\\\\\x64"' + '\n' == toml_module.dumps({'a': r'\\\x64'})

    # original from  
    #     assert 'a = "\\u0064"\n' == toml_module.dumps({'a': '\\x64'})
    #     assert 'a = "\\\\x64"\n' == toml_module.dumps({'a': '\\\\x64'})
    #     assert 'a = "\\\\\\u0064"\n' == toml_module.dumps({'a': '\\\\\\x64'})
    return test_bug_148



def make_correct_inverse_bug_148_test_function(toml_module):
    def test_bug_148_inverse():
        assert toml_module.loads(r'a = "\\x64"' + '\n') == {'a': r'\x64'}
        assert toml_module.loads(r'a = "\\\\x64"' + '\n') == {'a': r'\\x64'}
        assert toml_module.loads(r'a = "\\\\\\x64"' + '\n') == {'a': r'\\\x64'}
    return test_bug_148_inverse



test_inverse_bug_148_toml = make_correct_inverse_bug_148_test_function(toml)
test_bug_148_toml = make_correct_bug_148_test_function(toml)

# native in Python 3.11
try:
    test_inverse_bug_148_tomllib = make_correct_inverse_bug_148_test_function(tomllib)
except NameError:
    pass

test_inverse_bug_148_tomli = make_correct_inverse_bug_148_test_function(tomli)
test_inverse_bug_148_tomlkit = make_correct_inverse_bug_148_test_function(tomlkit)
test_inverse_bug_148_toml_tools = make_correct_inverse_bug_148_test_function(toml_tools)

test_bug_148_tomli_w = make_correct_bug_148_test_function(tomli_w)
test_bug_148_tomlkit = make_correct_bug_148_test_function(tomlkit)
test_bug_148_toml_tools = make_correct_bug_148_test_function(toml_tools)

Apr 26 '23 20:04 JamesParrott