typeshed icon indicating copy to clipboard operation
typeshed copied to clipboard

EmailMessage missing from email.parser

Open jrideout opened this issue 7 years ago • 8 comments

If policy.default is set, then methods like email.message_from_file will return EmailMessage rather than just Message

jrideout avatar Aug 25 '18 04:08 jrideout

We have basically three options here:

  • Return Union[Message, EmailMessage]. This does not really improve the situation, since an isinstance() is still necessary when EmailMessage is returned.
  • Return just EmailMessage. This can give false positives, but this is probably preferable to false negatives.
  • Return Any. In this case we lose all advantages of type checking.

Personally, I prefer the second option.

srittau avatar Nov 19 '18 11:11 srittau

What about overloads?

I think this handles the more common cases:

T = TypeVar('T', Message, EmailMessage)

@overload
def message_from_string(s: str, _factory: Callable[..., T]) -> T: ...

@overload
def message_from_string(s: str) -> Message: ...

@overload
def message_from_string(s: str, policy: EmailPolicy) -> EmailMessage: ...

jrideout avatar Nov 21 '18 07:11 jrideout

FWIW, I did go with your second option in my most recent project (which forked email for our own reasons). Our email/__init__.pyi looks like:

def message_from_string(s: str, policy: EmailPolicy) -> EmailMessage: ...
def message_from_bytes(s: bytes, policy: EmailPolicy) -> EmailMessage: ...
def message_from_file(fp: IO[str], policy: EmailPolicy) -> EmailMessage: ...
def message_from_binary_file(fp: IO[bytes], policy: EmailPolicy) -> EmailMessage: ...

jrideout avatar Nov 21 '18 07:11 jrideout

Overloads work fine for when a _class is provided, but will not work for policy, until literal types (python/mypy#3062) are implemented. (Your third overload needs an asterisk before the last option, though.)

srittau avatar Nov 21 '18 08:11 srittau

The overloads proposed don't actually capture the fact that policies themselves can have arbitrary factories which would require literal types. But we are choosing between imperfect options here. Just returning EmailMessage across the board will also be wrong for many use cases. How can we minimize that for likely real usage, while literal types are still in progress?

I took a quick pass at seeing if the override approach was feasible at all with a semi-contrived simplification. My intuition is that, in practice, users of these APIs will likely specify the _class or policy but not both.

from typing import Callable, TypeVar, Union, Type, overload, Any, Optional

class Message:
    def __init__(self, s: str) -> None :
        self.s = 'm' + s

class EmailMessage(Message):
    def __init__(self, s: str) -> None :
        self.s = 'em' + s

class Policy:
    factory: Type = Message

class EmailPolicy(Policy):
    factory: Type = EmailMessage

compat = Policy()
T = TypeVar('T')

@overload
def message_from_string(string: str, factory: Optional[Callable[..., T]], *args: Any) -> T: ...

@overload
def message_from_string(string: str, *args: Any) -> Message: ...

@overload
def message_from_string(string: str, *args: Any, policy: EmailPolicy) -> EmailMessage: ...

def message_from_string(string: Any, factory: Optional[Callable[..., T]] = None, *args: Any, policy: Policy = compat) -> Union[EmailMessage, Message, T]:
    f = factory or policy.factory
    return f(string)


def build_msg(string: str) -> Message:
    return Message(string)

def build_msg2(string: str) -> EmailMessage:
    return EmailMessage(string)

def build_msg3(string: str) -> bytes:
    return string.encode()


eml = message_from_string('abc')                         # Revealed type is 'test_types.Message'
eml2 = message_from_string('abc', build_msg)             # Revealed type is 'test_types.Message*'
eml3 = message_from_string('abc', build_msg2)            # Revealed type is 'test_types.EmailMessage*'
eml4 = message_from_string('abc', Message)               # Revealed type is 'test_types.Message*'
eml5 = message_from_string('abc', policy=EmailPolicy())  # Revealed type is 'test_types.EmailMessage'
eml6 = message_from_string('abc', build_msg3)            # Revealed type is 'builtins.bytes*'
eml7 = message_from_string('abc', build_msg3, 'other')   # Revealed type is 'builtins.bytes*'

eml8 = message_from_string('abc', policy=Policy())
# error: No overload variant of "message_from_string" matches argument types "str", "Policy"
# Revealed type is 'Any'

These results seem sane to me, though I admit I don't understand what the * suffix means. If we hold to the assumption that an explicitly set policy will be an EmailPolicy that produces EmailMessage, then all the other cases work. This fails when that assumptions fails, but seems to be to have small false-positive surface area. And to that question: what is the general policy around this? Err on the side of underspecification over incorrectness? Have a good balance of actual use? I'm happy to help drive this toward whatever standard is in use.

jrideout avatar Nov 21 '18 23:11 jrideout

The * suffix in mypy means it's an inferred type, which doesn't really matter unless you're debugging mypy internals.

JelleZijlstra avatar Nov 23 '18 06:11 JelleZijlstra

This problem could be solved by making Policy a generic type.

MessageT = TypeVar("MessageT")
class Policy(Generic[MessageT]): ...

class Message: ...
class Compat32(Policy[Message]): ...
compat32 = Compat32()

class EmailMessage: ...
class Default(Policy[EmailMessage]): ...
default = Default()

def message_from_string(
    string: str,
    factory: Optional[Callable[[], MessageT]] = None,
    *,
    policy: Policy[MessageT] = compat32,
) -> MessageT:
    ...
...

Unfortunately, this is another case where mypy’s insufficient support for generic defaults (python/mypy#3737) gets in the way. But we can work around that bug using overloads.

@overload
def message_from_string(
    string: str,
    factory: Optional[Callable[[], Message]] = None,
) -> Message:
    ...
@overload
def message_from_string(
    string: str,
    factory: Optional[Callable[[], MessageT]] = None,
    *,
    policy: Policy[MessageT],
) -> MessageT:
    ...
def message_from_string(
    string: str,
    factory: Optional[Callable[[], Any]] = None,
    *,
    policy: Any = compat32,
) -> Any:
    ...

andersk avatar Jun 05 '20 20:06 andersk

Is there anything blocking this issue? Could I try to fix this? If yes, what approach should I take? Is making an overload that returns EmailMessage when policy: email.policy.EmailPolicy (the class of default) enough?

GideonBear avatar Mar 03 '23 15:03 GideonBear