rtoml icon indicating copy to clipboard operation
rtoml copied to clipboard

Conversion of None to "null" string inconsistency

Open himbeles opened this issue 4 years ago • 5 comments

Currently, rtoml converts fields with value None to a value of string "null" in the dumped string. Loading the string back in results in a "null" string in python instead of None.

I can only imagine two possible correct behaviors for loading "null" strings instead:

  • "null" strings should be converted to None in python, or
  • fields with value "null" should not be imported at all.

Conversely, for dumping None values:

  • convert to null (without the quotation marks) -- representing a json null value
  • do not dump fields with value None at all

Here are two tests demonstrating the failing scenario with rtoml and the passing example using toml:

import rtoml

def test_none_to_null_string():
    """ fails with 

    ```AssertionError: assert {'optional_field': None} == {'optional_field': 'null'}```

    because None is dumped to "null" string value which is also loaded back in as the string "null".
    """
    d = {"optional_field": None}
    d_goal1 = d
    d_goal2 = {}
    dumped = rtoml.dumps(d)
    loaded = rtoml.loads(dumped)
    assert loaded == d_goal1 or loaded == d_goal2

def test_none_to_null_string_passing_toml_package():
    """ passes with toml packages, 
    because fields with value None are not dumped.
    """
    import toml
    d = {"optional_field": None}
    d_goal1 = d
    d_goal2 = {}

    dumped = toml.dumps(d)
    loaded = toml.loads(dumped)
    assert loaded == d_goal1 or loaded == d_goal2

himbeles avatar May 25 '21 11:05 himbeles

I can only imagine two possible correct behaviors for loading "null" strings instead:

* "null" strings should be converted to `None` in python, or

* fields with value "null" should not be imported at all.

Conversely, for dumping None values:

* convert to null (without the quotation marks) -- representing a json null value

* do not dump fields with value `None` at all

In general, the TOML v1.0.0 specification does not allow serialization/deserialization of NoneType. Therefore, NoneType is converted to a string type with a value of 'null'.

Until the author decides to add such functionality, it is possible to simulate the expected behavior at the Python level, for example like this:

import traceback
from pathlib import Path
from typing import Dict, Any

import rtoml


def load_config(toml_path: str, encoding='utf-8-sig', errors='ignore') -> Dict[str, Any]:
    """
    Custom loader for deserializing a TOML file with recursively
    replaced 'null' strings in Python NoneType

    :param toml_path: string path;
    :param encoding: source TOML encoding (Default opens UTF-8 with/without BOM);
    :param errors: 'strict' or 'ignore' encoding errors;

    :return: Deserialized TOML as Python dict
    """

    def dict_replace_null(_dict: dict) -> dict:
        x = {}
        for k, v in _dict.items():
            if isinstance(v, dict):
                v = dict_replace_null(v)
            elif isinstance(v, list):
                v = list_replace_null(v)
            elif isinstance(v, str) and v == 'null':
                v = None
            x[k] = v
        return x

    def list_replace_null(_list: list) -> list:
        x = []
        for e in _list:
            if isinstance(e, list):
                e = list_replace_null(e)
            elif isinstance(e, dict):
                e = dict_replace_null(e)
            elif isinstance(e, str) and e == 'null':
                e = None
            x.append(e)
        return x

    try:
        data = rtoml.load(Path(toml_path).read_text(
            encoding=encoding,
            errors=errors
        ))

        return dict_replace_null(data)

    except FileNotFoundError:
        print(traceback.format_exc())  # Del 'print()' and wrap in a logging

        return {}  # Return empty dict without stopping the program

ma1ex avatar Sep 12 '21 13:09 ma1ex

https://github.com/toml-lang/toml/issues/30 is the discussion about this for the TOML spec.

I happen to disagree with them, but that's doesn't really matter much.

If we follow their suggestions we should remove any Nones, but that still won't provide correct round tripping: [1, 2, 3, None, 5] would become [1, 2, 3, 5] after a round trip which is arguable more confusing than [1, 2, 3, 'null', 5], and the same with dicts.

I therefore think we should either:

  1. leave it as is - people can add their own logic as @ma1ex suggests
  2. add a none_string argument to load() and dump() which when set means that string is interpreted as None (and visa versa) - this would allow loss-less round trips

I guess we could also have another argument "omit_none" to dump() to just omit None as suggested in the discussion above, but that's kind of a separate feature.

I'm a bit busy, but happy to review a PR to add none_string and/or omit_none.

samuelcolvin avatar Sep 12 '21 13:09 samuelcolvin

I happen to disagree with them, but that's doesn't really matter much.

I am too!

If we follow their suggestions we should remove any Nones, but that still won't provide correcting round tripping: [1, 2, 3, None, 5] would become [1, 2, 3, 5] after a round trip which is arguable more confusing than [1, 2, 3, 'null', 5], and the same with dicts.

But then such behavior would break the logic, IMHO. Perhaps not everyone needs this functionality, but having it allows to migrate from JSON painlessly.

I therefore think we should either:

2. add a `none_string` argument to `load()` and `dump()` which when set means that string is interpreted as `None` (and visa versa) - this would allow loss-less round trips

Personally, I prefer this option, but at the level of Rust to preserve the integrity of the code and performance.

I guess we could also have another argument "omit_none" to dump() to just omit None as suggested in the discussion above, but that's kind of a separate feature.

I still think that "omit_none" in the case of Python is not a very useful functionality and will only confuse people and allow mistakes to be made.

I'm a bit busy, but happy to review a PR to add none_string and/or omit_none.

If you don't mind, I can suggest PR but at Python code level - implement my example in rtoml\__init__.py, as my knowledge of Rust, unfortunately, does not yet allow me to fully code in it ))

ma1ex avatar Sep 12 '21 15:09 ma1ex

I happen to disagree with them, but that's doesn't really matter much.

@samuelcolvin I also disagree, not having proper None really hurts usability of toml format. As we probably can't change official specification, the only solution is to have option to enable/disable various modes of rtoml operation (so people who want to have official behavior can have it).


I think the best solution is to have at least those two modes:

  • Mode "no-null" strictly following the specification
  • Mode "json-null" using null value similar to one in json, so without quotation marks as @ma1ex suggested:

Conversely, for dumping None values:

* convert to null (without the quotation marks) -- representing a json null value
  • Mode "string-null" is also possible, but I think it isn't good replacement for "json-null" mode.

This "json-null" would be easier to use than "string-null" for uses cases similar to json / yaml use cases. It can always happen that somebody wants to have None and "None" as separate values.


Having all 3 of those modes available of course won't hurt.

karolzlot avatar Apr 18 '22 23:04 karolzlot

@ma1ex I would like to give a little feedback to your solution you provided in https://github.com/samuelcolvin/rtoml/issues/23#issuecomment-917636181

In this case, as you need to modify only the current item, it is safe to modify directly dict/list which you are iterating over without creating second dict/list.

Not sure if it would save any memory though, because I suspect that Python isn't doubling anything in memory anyway. (?)

karolzlot avatar Apr 19 '22 04:04 karolzlot