toml icon indicating copy to clipboard operation
toml copied to clipboard

Backslash roundtrip problem

Open jeremysanders opened this issue 2 years ago • 5 comments

I'm having problems with Windows paths stored in toml files:

In [1]: import toml
In [2]: foo = {'a': 'C:\\hostedtoolcache\\windows\\Python\\3.9.13\\x64\\Lib\\site-packages\\PyQt5\\bindings'}
In [3]: d = toml.dumps(foo)
In [4]: d
Out[4]: 'a = "C:\\hostedtoolcache\\windows\\Python\\3.9.13\\x64\\\\Lib\\\\site-packages\\\\PyQt5\\\\bindings"\n'
In [5]: toml.loads(d)
/usr/lib/python3/dist-packages/toml/decoder.py in loads(s, _dict, decoder)
    512                                         multibackslash)
    513             except ValueError as err:
--> 514                 raise TomlDecodeError(str(err), original, pos)
    515             if ret is not None:
    516                 multikey, multilinestr, multibackslash = ret

TomlDecodeError: Reserved escape sequence used (line 1 column 1 char 0)

It looks like only some backslashes are escaped properly by dumps. I tested this with toml from github.

jeremysanders avatar Sep 05 '22 19:09 jeremysanders

Ok, I think I've narrowed this down to the presence of \x in the string:

In [24]: toml.dumps({'a': r'\x43'})
Out[24]: 'a = "\\u0043"\n'

https://github.com/uiri/toml/blob/59d83d0d51a976f11a74991fa7d220fc630d8bae/toml/encoder.py#L98 is wrong, as it splits on \x, but does not ignore \\x.

jeremysanders avatar Sep 05 '22 19:09 jeremysanders

I've created a pull request. However, I notice there are problems with strings like '\x02' which don't seem to work, which my pull request doesn't address.

jeremysanders avatar Sep 05 '22 20:09 jeremysanders

Got bitten by this just now. I have a user whose name starts with an 'x' and saving their home directory path into a config file breaks my app. Not fun.

davidfokkema avatar Sep 20 '22 10:09 davidfokkema

I'm switching to tomli (included in the standard library of version 3.11) in combination with tomli_w.

davidfokkema avatar Sep 20 '22 10:09 davidfokkema

We were also bitten by this:

>>> toml.dumps({'A': '\\x2d'})
'A = "\\u002d"\n'

As was already pointed out, this code is at fault: https://github.com/uiri/toml/blob/59d83d0d51a976f11a74991fa7d220fc630d8bae/toml/encoder.py#L99-L113

The code is extremely complicated and must be untangled in order to fix this bug. We didn't attempt it; instead we're planning on switching to tomli.

dimakuv avatar Oct 11 '22 13:10 dimakuv