py-ipv8 icon indicating copy to clipboard operation
py-ipv8 copied to clipboard

Underspecified result type of `EZPackOverlay._ez_unpack_auth` and `EZPackOverlay._ez_unpack_noauth`

Open kozlovsky opened this issue 2 years ago • 6 comments

Currently, the type of _ez_unpack_auth unpacked payload is not understood correctly by PyCharm:

2022-05-06_15-25-39

You can see that PyCharm can only see the base Payload methods and not the methods of IntroductionRequestPayload.

In the following PR, I suggest a fix that provides the correct result type:

2022-05-06_15-38-02

Also, while fixing that issue, I noticed that currently, DataclassPayload defined as

DataclassPayload = typing.TypeVar('DataclassPayload')
...
AnyPayload = typing.Union[Payload, DataclassPayload]
AnyPayloadType = typing.Union[typing.Type[Payload], typing.Type[DataclassPayload]]

In my opinion, it is not correct to use TypeVar here. Type variables are for generic functions when the function result type should be derived from the function arguments'. Here, DataclassPayload is not used for generics and does not specify that the class is related to data classes. So, as a part of a fix, I'm replacing it with a protocol that checks that the class is actually a data class.

kozlovsky avatar May 06 '22 14:05 kozlovsky

In my opinion, it is not correct to use TypeVar here.

Strictly speaking, it is not correct to even assign a type---at all---to a dataclass. I made up this fictional type of a DataclassPayload to bridge the gap between real Payload classes and dataclass runtime instances, that become Payload instances at runtime. I understand that this is not the prettiest solution, but it was a conscious decision to do it like this rather than trying to bend over backwards to try and type dataclass-ed classes, which aren't a class/type to begin with. Please don't change this (though I would be open to removing this DataclassPayload definition entirely).

qstokkink avatar May 06 '22 17:05 qstokkink

Your solution for defining DataclassPayload was pretty practical, and I like it. My only objection was that the TypeVar functionality is unnecessary here. TypeVar is useful when you need to specify the same type in two different parts of function signature or class definition. AnyPayload uses this way, not DataclassPayload, so the type variable is necessary for AnyPayload, while DataclassPayload can be a non-generic class.

Unfortunately, there is no common base class for data classes, so in Python 3.7, the correct non-generic class for a data class is Type[object], which can be replaced with just type. But in Python 3.8, it is possible to specify a correct DataclassPayload protocol that prevents non-data classes from being used here.

But it is not critical for this PR, so if you think the TypeVar has some benefits here compared to the protocol, I can revert it.

kozlovsky avatar May 06 '22 18:05 kozlovsky

I do agree with you that your typing proposal is stricter (and therefore better). However, my main objection is that your proposal is further reinforcing my somewhat ugly (though indeed practical) patch job of the type hierarchy. If we do change this, I'd like to see it move (more) to the ideal situation, visualized:

typeorg

Edit, erratum: I mislabeled the DataclassPayload as a fake class in this image. This is false: it is a real class.

If you have time for this, I'd love to an implementation of the "ideal" situation. Otherwise, I'd prefer a reversion. Alternatively, if you have a suggestion for an even more ideal ideal, I'd love to hear other suggestions for the type hierarchy.

qstokkink avatar May 09 '22 07:05 qstokkink

I think the "Current" scheme on the left does not reflect how the current code looks. I'd say it's something like that:

image

The left "Current" diagram assumes that Payload and DataclassPayload are inherited from AnyPaload or somehow depend on it. In the actual code, it is the reverse - AnyPayload is defined as a union of Payload and arbitrary dataclass, represented in the code using the DataclassPayload.

The right "Ideal" diagram assumes that DataclassPayload is not an arbitrary dataclass type, but a special subclass of it also inherited from Payload using multiple inheritance. Is it beneficial? I think it may be more convenient for a user to use an arbitrary dataclass just as in the current implementation of IPv8, as it imposes fewer restrictions on the user. Also, the current IPv8 code allows arbitrary dataclass as a payload, and imposing restrictions on it in future versions of IPv8 will break the backward compatibility.

kozlovsky avatar May 09 '22 08:05 kozlovsky

The right "Ideal" diagram assumes that DataclassPayload is not an arbitrary dataclass type, but a special subclass of it also inherited from Payload using multiple inheritance.

It doesn't just assume that, quite literally it is exactly that:

https://github.com/Tribler/py-ipv8/blob/1a25b0cdad299dce6f1989adc828e7904122ace0/ipv8/messaging/payload_dataclass.py#L51-L59

This can be used as both an arbitrary dataclass (which is not actually a class) and as a VariablePayload. Therefore, I disagree with your conclusion that typing it as such would somehow change backward compatibility. Furthermore, I do not understand your argument on having more options restricts the user.

By the way, I mislabeled the DataClassPayload as a fictional class before. This was my mistake, I'll add an erratum to my previous post.

qstokkink avatar May 09 '22 10:05 qstokkink

On a side note, how you draw the arrows depends on what convention you follow. In the existing IPv8 documentation we draw arrows from a stricter subtype to a more general supertype. For instance:

  • A subclass points to its superclass.
  • An interface implementation points to its interface.
  • A type points to a union it is a part of.

Please follow this convention so we don't mix styles.

qstokkink avatar May 10 '22 07:05 qstokkink

After almost a year of inactivity, I still see no clear actionable conclusion in this issue and I'll close this.

qstokkink avatar Feb 03 '23 09:02 qstokkink