py-ipv8
py-ipv8 copied to clipboard
Underspecified result type of `EZPackOverlay._ez_unpack_auth` and `EZPackOverlay._ez_unpack_noauth`
Currently, the type of _ez_unpack_auth
unpacked payload is not understood correctly by PyCharm:
You can see that PyCharm can only see the base Payload methods and not the methods of IntroductionRequestPayload.
In the following PR, I suggest a fix that provides the correct result type:
Also, while fixing that issue, I noticed that currently, DataclassPayload defined as
DataclassPayload = typing.TypeVar('DataclassPayload')
...
AnyPayload = typing.Union[Payload, DataclassPayload]
AnyPayloadType = typing.Union[typing.Type[Payload], typing.Type[DataclassPayload]]
In my opinion, it is not correct to use TypeVar
here. Type variables are for generic functions when the function result type should be derived from the function arguments'. Here, DataclassPayload
is not used for generics and does not specify that the class is related to data classes. So, as a part of a fix, I'm replacing it with a protocol that checks that the class is actually a data class.
In my opinion, it is not correct to use
TypeVar
here.
Strictly speaking, it is not correct to even assign a type---at all---to a dataclass. I made up this fictional type of a DataclassPayload
to bridge the gap between real Payload
classes and dataclass
runtime instances, that become Payload
instances at runtime. I understand that this is not the prettiest solution, but it was a conscious decision to do it like this rather than trying to bend over backwards to try and type dataclass
-ed classes, which aren't a class/type to begin with. Please don't change this (though I would be open to removing this DataclassPayload
definition entirely).
Your solution for defining DataclassPayload
was pretty practical, and I like it. My only objection was that the TypeVar
functionality is unnecessary here. TypeVar
is useful when you need to specify the same type in two different parts of function signature or class definition. AnyPayload
uses this way, not DataclassPayload
, so the type variable is necessary for AnyPayload
, while DataclassPayload
can be a non-generic class.
Unfortunately, there is no common base class for data classes, so in Python 3.7, the correct non-generic class for a data class is Type[object]
, which can be replaced with just type
. But in Python 3.8, it is possible to specify a correct DataclassPayload
protocol that prevents non-data classes from being used here.
But it is not critical for this PR, so if you think the TypeVar
has some benefits here compared to the protocol, I can revert it.
I do agree with you that your typing proposal is stricter (and therefore better). However, my main objection is that your proposal is further reinforcing my somewhat ugly (though indeed practical) patch job of the type hierarchy. If we do change this, I'd like to see it move (more) to the ideal situation, visualized:
Edit, erratum: I mislabeled the DataclassPayload
as a fake class in this image. This is false: it is a real class.
If you have time for this, I'd love to an implementation of the "ideal" situation. Otherwise, I'd prefer a reversion. Alternatively, if you have a suggestion for an even more ideal ideal, I'd love to hear other suggestions for the type hierarchy.
I think the "Current" scheme on the left does not reflect how the current code looks. I'd say it's something like that:
The left "Current" diagram assumes that Payload
and DataclassPayload
are inherited from AnyPaload
or somehow depend on it. In the actual code, it is the reverse - AnyPayload
is defined as a union of Payload
and arbitrary dataclass, represented in the code using the DataclassPayload
.
The right "Ideal" diagram assumes that DataclassPayload
is not an arbitrary dataclass type, but a special subclass of it also inherited from Payload
using multiple inheritance. Is it beneficial? I think it may be more convenient for a user to use an arbitrary dataclass just as in the current implementation of IPv8, as it imposes fewer restrictions on the user. Also, the current IPv8 code allows arbitrary dataclass as a payload, and imposing restrictions on it in future versions of IPv8 will break the backward compatibility.
The right "Ideal" diagram assumes that DataclassPayload is not an arbitrary dataclass type, but a special subclass of it also inherited from Payload using multiple inheritance.
It doesn't just assume that, quite literally it is exactly that:
https://github.com/Tribler/py-ipv8/blob/1a25b0cdad299dce6f1989adc828e7904122ace0/ipv8/messaging/payload_dataclass.py#L51-L59
This can be used as both an arbitrary dataclass (which is not actually a class) and as a VariablePayload
. Therefore, I disagree with your conclusion that typing it as such would somehow change backward compatibility. Furthermore, I do not understand your argument on having more options restricts the user.
By the way, I mislabeled the DataClassPayload
as a fictional class before. This was my mistake, I'll add an erratum to my previous post.
On a side note, how you draw the arrows depends on what convention you follow. In the existing IPv8 documentation we draw arrows from a stricter subtype to a more general supertype. For instance:
- A subclass points to its superclass.
- An interface implementation points to its interface.
- A type points to a union it is a part of.
Please follow this convention so we don't mix styles.
After almost a year of inactivity, I still see no clear actionable conclusion in this issue and I'll close this.