cattrs icon indicating copy to clipboard operation
cattrs copied to clipboard

cattrs structure string to List, instead of failing conversion

Open yunstanford opened this issue 6 years ago • 13 comments

@attr.s
class Model:
    field = attr.ib(type=List[str])

src_dict = {"field": "hello"}

cattrs.structure(src_dict, Model)

=> ["h", "e", "l", "l", "o"]

I'd expect it to fail converting.

yunstanford avatar Oct 09 '18 23:10 yunstanford

same issue reversely,

@attr.s
class Model:
    field = attr.ib(type=str)

src_dict = {"field": ["hello"]}

cattrs.structure(src_dict, Model)

=>'["hello"]'

yunstanford avatar Oct 10 '18 00:10 yunstanford

This hard conversion is quite confusing and unexpected.

yunstanford avatar Oct 10 '18 00:10 yunstanford

Hm yeah, the problem is a string IS a sequence of something.

The reverse is a consequence of type coercion we do. A converter type with coercion disabled for primitives would probably be a good idea.

Tinche avatar Oct 10 '18 15:10 Tinche

@Tinche yeah, i don't think it should do hard coercion, it'll make type validation useless.

yunstanford avatar Oct 12 '18 04:10 yunstanford

As said, str is a Sequence, but not a List.

I'm not an expert in typing, but here's an example:

from typing import Sequence, List, Iterable

sequence: Sequence[str] = "abc"
lst: List[str] = "abc"

Running mypy gives:

__main__:4: error: Incompatible types in assignment (expression has type "str", variable has type "List[str]")

If cattrs started supporting Sequence, you could easily choose the desired behaviour (assuming we make a converter with type coercion disabled).

madsmtm avatar Jan 24 '19 20:01 madsmtm

I got hit by this today:

import attr, cattr
from typing import List

@attr.dataclass
class Some(object):
    a: List[str]


data = {'a': 'bad string'}
print(cattr.GenConverter().structure(data, Some))
# Some(a=['b', 'a', 'd', ' ', 's', 't', 'r', 'i', 'n', 'g'])

I expect the structure to fail. Are there any work-arounds?

sobolevn avatar Mar 25 '21 15:03 sobolevn

I guess the source of a problem is this line: https://github.com/Tinche/cattrs/blob/5bfa312885a712ee62991d58a3eb32d8ca82865e/src/cattr/converters.py#L118

It treats str as a sequence, but converts it as a string. Which is not right.

sobolevn avatar Mar 25 '21 15:03 sobolevn

So here's the deal. Structuring is inherently a transformation from one data structure to another. For example, you generally want to transform a dictionary into a class, not to validate the dictionary (that's not the point). So that logic translates into how we handle primitives too - so if your class expects a string in a specific field, and there's a float instead, it'll coerce the float into the string (another reason is that this is actually faster too).

Now, for the problem at hand. We can simplify to this:

>>> from cattr import GenConverter
>>> c = GenConverter()
>>> c.structure("123", list[str])
['1', '2', '3']

which is the behavior you're not expecting.

What's happening here is the logic for structuring a list is being called, since that's the target type. @sobolevn the line of code you're linking, it's matching on the target type (list[str]), not the source type (which here is str).

The structuring handler takes an Iterable (the only thing it does is perform a for loop on the input argument), and outputs a list. A string is an iterable, so it works. Checking the type of the input would be complicated (what to check, even?) and slow.

That said, I think this is mostly a problem with strings (i.e. this case couldn't happen with list[int]). Here's a very simple fix:

from cattr import GenConverter
from typing import List


def structure_list_strict(v, _):
    if not isinstance(v, list):
        raise Exception()
    return [str(e) for e in v]


c = GenConverter()
c.register_structure_hook(List[str], structure_list_strict)
print(c.structure("123", List[str]))  # Throws an exception

Of course, your input can't contain tuples or alternative iterables.

Tinche avatar Mar 26 '21 00:03 Tinche

Hm yeah, the problem is a string IS a sequence of something.

The reverse is a consequence of type coercion we do. A converter type with coercion disabled for primitives would probably be a good idea.

How do you create a converter with coercion disabled?

joe-millington avatar Sep 14 '22 14:09 joe-millington

IMO it may be a mistake in python, that str is designed as a subtype of Sequence[str] (also Collection[str] and Iterable[str]). It's everywhere in python, even mypy accepts something like field: Iterable[str] = "123" because it's valid. It's not possible to fix it in python now, so we have to adapt to it.

Here's a very simple fix:

from cattr import GenConverter
from typing import List


def structure_list_strict(v, _):
    if not isinstance(v, list):
        raise Exception()
    return [str(e) for e in v]


c = GenConverter()
c.register_structure_hook(List[str], structure_list_strict)
print(c.structure("123", List[str]))  # Throws an exception

I'm not sure if I missed something, but it doesn't work for me. register_structure_hook(List[str], ...) raises a TypeError: Invalid first argument to `register()`. list[str] is not a class or union type., because functools.singledispatch.register doesn't support GenericAlias.

The structuring handler takes an Iterable (the only thing it does is perform a for loop on the input argument), and outputs a list. A string is an iterable, so it works. Checking the type of the input would be complicated (what to check, even?) and slow.

That said, I think this is mostly a problem with strings (i.e. this case couldn't happen with list[int]).

I think we can have a special check for str, if we expect a Iterable[str] (or its subtypes), there should be a check to ensure the object is not a str.

The reverse is a consequence of type coercion we do. A converter type with coercion disabled for primitives would probably be a good idea.

I agree, there should be a way to disable coercion for primitives.

GalaxySnail avatar Oct 20 '22 17:10 GalaxySnail

Great, singledispatch got changed in the meantime. Oh well ;)

Since the next release is supposed to have a focus on validation, it may be a good time to revisit this.

Tinche avatar Oct 21 '22 10:10 Tinche

In the meantime, here's the updated version of the snippet:

from cattrs import Converter


def structure_list_strict(v: list, _) -> list[str]:
    if not isinstance(v, list):
        raise Exception()
    return [str(e) for e in v]


c = Converter()
c.register_structure_hook_func(lambda t: t == list[str], structure_list_strict)
print(c.structure("123", list[str]))

This example uses a predicate function instead of the singledispatch. The type hints are completely optional, I just enjoy them ;)

Tinche avatar Oct 21 '22 10:10 Tinche

Hi @Tinche, Do you have any plans to add option for disabling coercion for primitives in the future?

coreegor avatar Sep 20 '23 18:09 coreegor