rtoml
rtoml copied to clipboard
Conversion of None to "null" string inconsistency
Currently, rtoml converts fields with value None to a value of string "null" in the dumped string.
Loading the string back in results in a "null" string in python instead of None.
I can only imagine two possible correct behaviors for loading "null" strings instead:
- "null" strings should be converted to
Nonein python, or - fields with value "null" should not be imported at all.
Conversely, for dumping None values:
- convert to null (without the quotation marks) -- representing a json null value
- do not dump fields with value
Noneat all
Here are two tests demonstrating the failing scenario with rtoml and the passing example using toml:
import rtoml
def test_none_to_null_string():
""" fails with
```AssertionError: assert {'optional_field': None} == {'optional_field': 'null'}```
because None is dumped to "null" string value which is also loaded back in as the string "null".
"""
d = {"optional_field": None}
d_goal1 = d
d_goal2 = {}
dumped = rtoml.dumps(d)
loaded = rtoml.loads(dumped)
assert loaded == d_goal1 or loaded == d_goal2
def test_none_to_null_string_passing_toml_package():
""" passes with toml packages,
because fields with value None are not dumped.
"""
import toml
d = {"optional_field": None}
d_goal1 = d
d_goal2 = {}
dumped = toml.dumps(d)
loaded = toml.loads(dumped)
assert loaded == d_goal1 or loaded == d_goal2
I can only imagine two possible correct behaviors for loading "null" strings instead:
* "null" strings should be converted to `None` in python, or * fields with value "null" should not be imported at all.Conversely, for dumping
Nonevalues:* convert to null (without the quotation marks) -- representing a json null value * do not dump fields with value `None` at all
In general, the TOML v1.0.0 specification does not allow serialization/deserialization of NoneType. Therefore, NoneType is converted to a string type with a value of 'null'.
Until the author decides to add such functionality, it is possible to simulate the expected behavior at the Python level, for example like this:
import traceback
from pathlib import Path
from typing import Dict, Any
import rtoml
def load_config(toml_path: str, encoding='utf-8-sig', errors='ignore') -> Dict[str, Any]:
"""
Custom loader for deserializing a TOML file with recursively
replaced 'null' strings in Python NoneType
:param toml_path: string path;
:param encoding: source TOML encoding (Default opens UTF-8 with/without BOM);
:param errors: 'strict' or 'ignore' encoding errors;
:return: Deserialized TOML as Python dict
"""
def dict_replace_null(_dict: dict) -> dict:
x = {}
for k, v in _dict.items():
if isinstance(v, dict):
v = dict_replace_null(v)
elif isinstance(v, list):
v = list_replace_null(v)
elif isinstance(v, str) and v == 'null':
v = None
x[k] = v
return x
def list_replace_null(_list: list) -> list:
x = []
for e in _list:
if isinstance(e, list):
e = list_replace_null(e)
elif isinstance(e, dict):
e = dict_replace_null(e)
elif isinstance(e, str) and e == 'null':
e = None
x.append(e)
return x
try:
data = rtoml.load(Path(toml_path).read_text(
encoding=encoding,
errors=errors
))
return dict_replace_null(data)
except FileNotFoundError:
print(traceback.format_exc()) # Del 'print()' and wrap in a logging
return {} # Return empty dict without stopping the program
https://github.com/toml-lang/toml/issues/30 is the discussion about this for the TOML spec.
I happen to disagree with them, but that's doesn't really matter much.
If we follow their suggestions we should remove any Nones, but that still won't provide correct round tripping: [1, 2, 3, None, 5] would become [1, 2, 3, 5] after a round trip which is arguable more confusing than [1, 2, 3, 'null', 5], and the same with dicts.
I therefore think we should either:
- leave it as is - people can add their own logic as @ma1ex suggests
- add a
none_stringargument toload()anddump()which when set means that string is interpreted asNone(and visa versa) - this would allow loss-less round trips
I guess we could also have another argument "omit_none" to dump() to just omit None as suggested in the discussion above, but that's kind of a separate feature.
I'm a bit busy, but happy to review a PR to add none_string and/or omit_none.
I happen to disagree with them, but that's doesn't really matter much.
I am too!
If we follow their suggestions we should remove any
Nones, but that still won't provide correcting round tripping:[1, 2, 3, None, 5]would become[1, 2, 3, 5]after a round trip which is arguable more confusing than[1, 2, 3, 'null', 5], and the same with dicts.
But then such behavior would break the logic, IMHO. Perhaps not everyone needs this functionality, but having it allows to migrate from JSON painlessly.
I therefore think we should either:
2. add a `none_string` argument to `load()` and `dump()` which when set means that string is interpreted as `None` (and visa versa) - this would allow loss-less round trips
Personally, I prefer this option, but at the level of Rust to preserve the integrity of the code and performance.
I guess we could also have another argument "
omit_none" todump()to just omitNoneas suggested in the discussion above, but that's kind of a separate feature.
I still think that "omit_none" in the case of Python is not a very useful functionality and will only confuse people and allow mistakes to be made.
I'm a bit busy, but happy to review a PR to add
none_stringand/oromit_none.
If you don't mind, I can suggest PR but at Python code level - implement my example in rtoml\__init__.py, as my knowledge of Rust, unfortunately, does not yet allow me to fully code in it ))
I happen to disagree with them, but that's doesn't really matter much.
@samuelcolvin I also disagree, not having proper None really hurts usability of toml format.
As we probably can't change official specification, the only solution is to have option to enable/disable various modes of rtoml operation (so people who want to have official behavior can have it).
I think the best solution is to have at least those two modes:
- Mode "no-null" strictly following the specification
- Mode "json-null" using null value similar to one in json, so without quotation marks as @ma1ex suggested:
Conversely, for dumping
Nonevalues:* convert to null (without the quotation marks) -- representing a json null value
- Mode "string-null" is also possible, but I think it isn't good replacement for "json-null" mode.
This "json-null" would be easier to use than "string-null" for uses cases similar to json / yaml use cases.
It can always happen that somebody wants to have None and "None" as separate values.
Having all 3 of those modes available of course won't hurt.
@ma1ex I would like to give a little feedback to your solution you provided in https://github.com/samuelcolvin/rtoml/issues/23#issuecomment-917636181
In this case, as you need to modify only the current item, it is safe to modify directly dict/list which you are iterating over without creating second dict/list.
Not sure if it would save any memory though, because I suspect that Python isn't doubling anything in memory anyway. (?)