cattrs
cattrs copied to clipboard
cattrs structure string to List, instead of failing conversion
@attr.s
class Model:
field = attr.ib(type=List[str])
src_dict = {"field": "hello"}
cattrs.structure(src_dict, Model)
=> ["h", "e", "l", "l", "o"]
I'd expect it to fail converting.
same issue reversely,
@attr.s
class Model:
field = attr.ib(type=str)
src_dict = {"field": ["hello"]}
cattrs.structure(src_dict, Model)
=>'["hello"]'
This hard conversion is quite confusing and unexpected.
Hm yeah, the problem is a string IS a sequence of something.
The reverse is a consequence of type coercion we do. A converter type with coercion disabled for primitives would probably be a good idea.
@Tinche yeah, i don't think it should do hard coercion, it'll make type validation useless.
As said, str
is a Sequence
, but not a List
.
I'm not an expert in typing
, but here's an example:
from typing import Sequence, List, Iterable
sequence: Sequence[str] = "abc"
lst: List[str] = "abc"
Running mypy
gives:
__main__:4: error: Incompatible types in assignment (expression has type "str", variable has type "List[str]")
If cattrs
started supporting Sequence
, you could easily choose the desired behaviour (assuming we make a converter with type coercion disabled).
I got hit by this today:
import attr, cattr
from typing import List
@attr.dataclass
class Some(object):
a: List[str]
data = {'a': 'bad string'}
print(cattr.GenConverter().structure(data, Some))
# Some(a=['b', 'a', 'd', ' ', 's', 't', 'r', 'i', 'n', 'g'])
I expect the structure to fail. Are there any work-arounds?
I guess the source of a problem is this line: https://github.com/Tinche/cattrs/blob/5bfa312885a712ee62991d58a3eb32d8ca82865e/src/cattr/converters.py#L118
It treats str
as a sequence, but converts it as a string. Which is not right.
So here's the deal. Structuring is inherently a transformation from one data structure to another. For example, you generally want to transform a dictionary into a class, not to validate the dictionary (that's not the point). So that logic translates into how we handle primitives too - so if your class expects a string in a specific field, and there's a float instead, it'll coerce the float into the string (another reason is that this is actually faster too).
Now, for the problem at hand. We can simplify to this:
>>> from cattr import GenConverter
>>> c = GenConverter()
>>> c.structure("123", list[str])
['1', '2', '3']
which is the behavior you're not expecting.
What's happening here is the logic for structuring a list is being called, since that's the target type. @sobolevn the line of code you're linking, it's matching on the target type (list[str]
), not the source type (which here is str
).
The structuring handler takes an Iterable (the only thing it does is perform a for loop on the input argument), and outputs a list. A string is an iterable, so it works. Checking the type of the input would be complicated (what to check, even?) and slow.
That said, I think this is mostly a problem with strings (i.e. this case couldn't happen with list[int]
). Here's a very simple fix:
from cattr import GenConverter
from typing import List
def structure_list_strict(v, _):
if not isinstance(v, list):
raise Exception()
return [str(e) for e in v]
c = GenConverter()
c.register_structure_hook(List[str], structure_list_strict)
print(c.structure("123", List[str])) # Throws an exception
Of course, your input can't contain tuples or alternative iterables.
Hm yeah, the problem is a string IS a sequence of something.
The reverse is a consequence of type coercion we do. A converter type with coercion disabled for primitives would probably be a good idea.
How do you create a converter with coercion disabled?
IMO it may be a mistake in python, that str
is designed as a subtype of Sequence[str]
(also Collection[str]
and Iterable[str]
). It's everywhere in python, even mypy accepts something like field: Iterable[str] = "123"
because it's valid. It's not possible to fix it in python now, so we have to adapt to it.
Here's a very simple fix:
from cattr import GenConverter from typing import List def structure_list_strict(v, _): if not isinstance(v, list): raise Exception() return [str(e) for e in v] c = GenConverter() c.register_structure_hook(List[str], structure_list_strict) print(c.structure("123", List[str])) # Throws an exception
I'm not sure if I missed something, but it doesn't work for me. register_structure_hook(List[str], ...)
raises a TypeError: Invalid first argument to `register()`. list[str] is not a class or union type.
, because functools.singledispatch.register
doesn't support GenericAlias
.
The structuring handler takes an Iterable (the only thing it does is perform a for loop on the input argument), and outputs a list. A string is an iterable, so it works. Checking the type of the input would be complicated (what to check, even?) and slow.
That said, I think this is mostly a problem with strings (i.e. this case couldn't happen with
list[int]
).
I think we can have a special check for str
, if we expect a Iterable[str]
(or its subtypes), there should be a check to ensure the object is not a str
.
The reverse is a consequence of type coercion we do. A converter type with coercion disabled for primitives would probably be a good idea.
I agree, there should be a way to disable coercion for primitives.
Great, singledispatch got changed in the meantime. Oh well ;)
Since the next release is supposed to have a focus on validation, it may be a good time to revisit this.
In the meantime, here's the updated version of the snippet:
from cattrs import Converter
def structure_list_strict(v: list, _) -> list[str]:
if not isinstance(v, list):
raise Exception()
return [str(e) for e in v]
c = Converter()
c.register_structure_hook_func(lambda t: t == list[str], structure_list_strict)
print(c.structure("123", list[str]))
This example uses a predicate function instead of the singledispatch. The type hints are completely optional, I just enjoy them ;)
Hi @Tinche, Do you have any plans to add option for disabling coercion for primitives in the future?