desert icon indicating copy to clipboard operation
desert copied to clipboard

Support dataclass default None as allowed type

Open remidebette opened this issue 4 years ago • 15 comments

Hi, thanks for the nice library.

The following use case triggers an error in desert:

from dataclasses import dataclass

import desert

@dataclass
class WebhooksInfoItem:
    url: str = None

WebhooksInfoItemSchema = desert.schema(WebhooksInfoItem)

WebhooksInfoItemSchema.loads('{}')
# Ok: WebhooksInfoItem(url=None)

WebhooksInfoItemSchema.loads('{"url": null}')
# marshmallow.exceptions.ValidationError: {'url': ['Field may not be null.']}

marshmallow_dataclass seems to support optional declaration through default None in both cases. Is it forbidden in desert by design? It does not look like it according to the first usecase.

I feel that = None to be an implicit equivalent of Optional[] to be a good shortcut. Maybe I do not see a potential drawback?

remidebette avatar May 01 '20 11:05 remidebette

Do dataclasses share the opinion that the None default changes the typehint to be optional? Or does mypy? attrs has an issue discussing the other way with an optional type hint implying a None default. https://github.com/python-attrs/attrs/issues/460 This makes a bit more sense to me (after a few seconds of thought anyways) since it means that the developer is providing the complete correct type hint and code is deriving something from the hint. Using the None as the indicator means that some code should rewrite the type hint to be optional which seems less clean.

altendky avatar May 01 '20 12:05 altendky

So my understanding of the question is: are those two equivalent in deserialisation?

  • the url field is not present
  • the url field is present but its value is None

I guess it is a decision to make. In the case of type hints I see here:

You can use Optional[X] as a shorthand for Union[X, None].

remidebette avatar May 01 '20 12:05 remidebette

I see two questions. In no particular order...

One question is 'should a default that doesn't fit the type hint be allowed'. I tend towards 'no' or maybe 'desert should not take a stand on this and leave it to underlying libraries'. I think that's the difference between not present (uses the default that disagrees with the type hint) and present but None (fails the validation because None isn't a str which was what was specified). I think that's the question you just stated. I agree that the present behavior is at the very least odd.

Another question is 'should None being an acceptable value be described by the type hint or by the default value'. I tend towards writing the correct type hint rather than writing the wrong one then also a default that disagrees with the written type hint.

I would think ending up with what the attrs issue discusses seems the cleanest. An optional attribute will get assigned a default of None when no default is specified.

@dataclass
class WebhooksInfoItem:
    url: typing.Optional[str]

Do you see an issue or downside to writing your original attribute this way? (without making a claim this presently works, I haven't gotten into code at this point)

altendky avatar May 01 '20 13:05 altendky

I am still discovering the way to use the lib.

I do think Optional is actually more suited to my purpose. As it works as expected I already reworked my code to use Optional instead.

(It also has the advantage that I do not need to bubble down all optional parameters to the bottom of my dataclass.)

But I still think the error does not protect against much.

Very interestingly, when using the python dataclass constructor itself the behavior is exactly reversed since it is = None that makes it a python keyword (hence optional):

from dataclasses import dataclass
from typing import Optional

import desert

@dataclass
class WebhooksInfoItem:
    url: str = None

WebhooksInfoItemSchema = desert.schema(WebhooksInfoItem)

WebhooksInfoItemSchema.load({})
# Ok: WebhooksInfoItem(url=None)   as there is a default value
WebhooksInfoItem()
# Ok: WebhooksInfoItem(url=None)  as there is a default value

@dataclass
class WebhooksInfoItem:
    url: Optional[str]

WebhooksInfoItemSchema = desert.schema(WebhooksInfoItem)

WebhooksInfoItemSchema.load({})
# Ok: WebhooksInfoItem(url=None)   since it is Optional
# also WebhooksInfoItemSchema.dump(WebhooksInfoItemSchema.load({}))
# {"url":None}

WebhooksInfoItem()
# Error: TypeError: __init__() missing 1 required positional argument: 'url'

This all might confuse a python beginner starting to use desert.

remidebette avatar May 01 '20 14:05 remidebette

Optional in a callable's signature means 'there is a default and you aren't required to pass this' while optional in typing means 'can also be None'. So they are two distinct things. url: str = 37 would also make url be an optional parameter regarding class construction but it doesn't seem like it should imply None is an acceptable value.

So it sounds like you are satisfied with the functional path of url: typing.Optional[str] that addresses allowing null to be loaded. Bonuses include mypy and IDEs working sensibly. :]

The second example there relates to dataclasses implementing what was being discussed for attrs in https://github.com/python-attrs/attrs/issues/460 I think? So that would not be a thing for desert to address. I would be interested to hear if you find any discussion of this.

The other question raised here is what should be done with a default that disagrees with the type hint. So, should someone complain early when you write url: str = None since None is not a str. Could as well be url: str = 29 or any other non-str default. I'm not sure if this is an attrs/dataclass responsibility or something to be done by desert when you create the schema.

altendky avatar May 01 '20 14:05 altendky

I agree with your points. As per the new question raised, I am a fan of failing early instead of letting a user write an entire codebase on some wrong asumption (like I did) only to find out later and have to modify it.

remidebette avatar May 01 '20 16:05 remidebette

To the strict interpretation of type specifications this class

@dataclass
class MyData:
    url: str = None

is in conflict with its specification. The None default is not a str. Hence my opinion is that it would be expected that you get ValidationError when trying to load data that is null/None. I believe the most common interpretation is that present fields in json carries values, even if they are None. They need to be removed from the file to be omitted.

@dataclass
class MyData:
    url: Optional[str]

MyData()
# TypeError: __init__() missing 1 required positional argument: 'url'

is also expected. Giving it a type hint of Optional[str] doesn't change it's behavior, and the field is missing an default for url. I.e. the Optional isn't a method for giving the field a default value. The fix is simple enough:

    url: Optional[str] = None

sveinse avatar May 02 '20 22:05 sveinse

Indeed in the end I would say I was confused about the fact url: str = None did not raise an error and hence seemed to be the de facto declaration of optional. To make matters consistent I would recommend raising an error when the typing is not respected and hinting the use of the Optional type in the error message.

remidebette avatar May 03 '20 08:05 remidebette

The last confusing point comes from the fact that, as I stated, the dataclass way of handling things persists in an instance a field passed with None value in the same way that no field passed at all.

This makes the schema dumps populating every field with Optional parameters (even if no None initialization defined) as such:

@dataclass
class WebhooksInfoItem:
    url: Optional[str]

WebhooksInfoItemSchema = desert.schema(WebhooksInfoItem)

WebhooksInfoItemSchema.dump(WebhooksInfoItemSchema.load({}))
# {"url":None}

But I guess this is a python language decision (None cannot be distinguished from not passed) that this library can do little about.

Still, is there a way in this library to inject marshmallow post_dump method to the schema instead as explained in https://github.com/marshmallow-code/marshmallow/issues/229#issuecomment-134387999?

remidebette avatar May 03 '20 08:05 remidebette

So, in my opinion, to go to the full extent of the interpretation that @altendky and @sveinse both stated, the dump should behave like this:

@dataclass
class WebhooksInfoItem:
    url: Optional[str]
    id: Optional[str] = None

WebhooksInfoItemSchema = desert.schema(WebhooksInfoItem)

WebhooksInfoItemSchema.dump(WebhooksInfoItemSchema.load({}))
# {"id":None, "url": None}  current behavior 
# {"id":None}               proposed behavior

Following the schema definition rather than the dataclass instance value for the case of None at least. This would allow for both a fine grained customization (instead of a global post_dump) and reduction of serialized data passed on the network for APIs

remidebette avatar May 03 '20 09:05 remidebette

If you want to distinguish between None and... nothing... then yes, you'll need to create a thing to represent nothing. This is not uncommon for situations such as attrs and having a parameter to accept a default value. Then you'll have something like url: typing.Union[str, None, MyNothingType] = MyNothingType() (pending singleton etc considerations).

I'm not clear what part of that example is what you feel we proposed vs. something you are proposing. Sparse serialization seems like a new topic to me.

altendky avatar May 03 '20 13:05 altendky

All I've got at the moment for a workaround to add post-dump is to generate the schema, add a post-dump to it by either inheritance or assignment, then manually specify that schema for any field where you want to use it. Sure, that's ugly.

altendky avatar May 03 '20 13:05 altendky

So, in my opinion, to go to the full extent of the interpretation that @altendky and @sveinse both stated, the dump should behave like this:

@dataclass
class WebhooksInfoItem:
    url: Optional[str]
    id: Optional[str] = None

WebhooksInfoItemSchema = desert.schema(WebhooksInfoItem)

WebhooksInfoItemSchema.dump(WebhooksInfoItemSchema.load({}))
# {"id":None, "url": None}  current behavior 
# {"id":None}               proposed behavior

Following the schema definition rather than the dataclass instance value for the case of None at least. This would allow for both a fine grained customization (instead of a global post_dump) and reduction of serialized data passed on the network for APIs

With this proposal, the behavior would change depending if the field happen to have a default value on an Optional hinted type. I think it might be inelegant to use the default value as an specification for serialization. Explicit is better than implicit. E.g. the creation of the class instance object with its defaults might differ from the need in serialization.

With the statement WebhooksInfoItemSchema.load({}), the content of url will be None unless as @altendky suggests a NOTHING object, Thus the type hint is inaccurate as it is. (I believe attrs has NOTHING if I'm not mistaking). Next, given a py object with url = None, how can the serializer know if url is present and should be None vs the value is None and should not be present in the serialization? Based on the scheme load based on the class default value?

sveinse avatar May 03 '20 15:05 sveinse

For what it's worth, my random two cents is that when we instantiate the desert.Schema in the first place, just as we can add meta={"unknown": marshmallow.EXCLUDE} we should also be able to add a "partial": True", which should implicitly make all fields in the Dataclass schema have from typing import Optional wrapped around each member field of the Dataclass if it doesn't already have it when it gets processed internally.

My personal inuitition when I encountered this problem was that the partial=True kwarg in the schema's instantiation or load methods should have done that, but it didn't, which led me to this thread. Thanks for reading! 🚀

wanderrful avatar Jul 28 '20 17:07 wanderrful

I randomly landed on this thread not looking at the desert library but at others' usage of typing.Optional. I wanted to point out a few things about how @dataclass seems to be getting used here and why the below proposed __repr__() for when url is not provided a value during class instantiation wouldn't work easily for any implementation just based on how @dataclass works:

So, in my opinion, to go to the full extent of the interpretation that @altendky and @sveinse both stated, the dump should behave like this:

@dataclass
class WebhooksInfoItem:
    url: Optional[str]
    id: Optional[str] = None

WebhooksInfoItemSchema = desert.schema(WebhooksInfoItem)

WebhooksInfoItemSchema.dump(WebhooksInfoItemSchema.load({}))
# {"id":None, "url": None}  current behavior 
# {"id":None}               proposed behavior

Following the schema definition rather than the dataclass instance value for the case of None at least. This would allow for both a fine grained customization (instead of a global post_dump) and reduction of serialized data passed on the network for APIs

The above proposed behavior keeps url as part of the __init__() method call unless you use field(init=False) to remove it. So this part of that code...

@dataclass
class WebhooksInfoItem:
    url: Optional[str]
    id: Optional[str] = None

...is equivalent to the below at runtime when you call and instantiate this class:

class WebhooksInfoItem:
    def __init__(self, url: Union[str,None], id: Union[str,None] = None):
        self.url = url
        self.id = id

As you can see above, the url attribute is still required to be set as part of the __init__() method.

Python at runtime sees url: Optional[str] and url: Optional[str] = None as equivalent for how the class treats the field when used within the __init__() method. Using url: Optional[str] just requires you provide at least one argument for the url attribute because you didn't assign a default value, but the important part to remember is an argument is required for url when calling the __init__() method at all times in the case of the code above. Below is an example of how the interpreter would see various calls to instantiate/initialize this class:

@dataclass
class WebhooksInfoItem:
    url: Optional[str]
    id: Optional[str] = None

WebhooksInfoItem() - > WebhooksInfoItem(url=, id=None) #error
WebhooksInfoItem(None) -> WebhooksInfoItem(url=None, id=None) #OK
WebhooksInfoItem({}) -> WebhooksInfoItem(url={}, id=None) #OK
WebhooksInfoItem(id='123') -> WebhooksInfoItem(url=, id='123') #error
WebhooksInfoItem(id={}) -> WebhooksInfoItem(url=, id={}) #error

A default value of any kind just gives you a nice shortcut so you don't have to pass a value yourself every time. It also allows you to define an appropriate default value (and I stress this next part) if one can be assumed. I stress this because usually any assumption for default value, even None vs not setting the attribute, is typically application specific, or potentially a business logic rather than a technical decision depending on the application purpose.

You may want to look at the post_init method call for dataclasses instead as a possible alternative for what you're trying to accomplish and you can do type checking at runtime by utilizing isinstance(). I personally recommend also looking at the @property decorator: it allows you to define getter() and setter() methods and it works with the @dataclass decorator.

I think these options would be a more Pythonic way for you to determine if the url attribute was passed a "valid" value as you can then define what that means and explicitly handle other values such as None or Null as you see fit per class attribute.

stacynoland avatar Jun 03 '21 02:06 stacynoland