python-betterproto icon indicating copy to clipboard operation
python-betterproto copied to clipboard

Proposal to make oneof-deconstruction typecheckable

Open 124C41p opened this issue 3 years ago • 1 comments

According to the docs, deconstruction of a oneof-type is meant to be done in the following way:

syntax = "proto3";

message Test {
  oneof foo {
    bool on = 1;
    int32 count = 2;
    string name = 3;
  }
}
foo_type, foo_value = betterproto.which_one_of(test, 'foo')

if foo_type == 'on':
  process_on(foo_value)
elif foo_type == 'count':
  process_count(foo_value)
elif foo_type == 'name':
  process_name(foo_value)
else:
  raise ValueError('No value for foo specified!')

Unfortunately, in that way development tooling (pylance, mypy, etc) can neither check for typos nor for completeness. (Imagine to introduce a new field string address = 4; or to remove on of the other fields later. A type checker could not see any problem with our python code after executing protoc.)

There are different possible ways to solve that issue.

Variant 1 - Make all sub-fields of foo optional

You could change the implementation of the Test dataclass such that all sub fields of foo are optional, and the unset fields are set to None. Then our deconstruction could look like this:

if test.on is not None:
  process_on(test.on)
elif test.count is not None:
  process_count(test.count)
elif test.name is not None:
  process_name(test.name)
else:
  raise ValueError('No value for foo specified!')

Variant 2 - Introduce an enum specifying the field set

You could add an enum field to the Test dataclass which encodes the actual field set. Then our deconstruction could look like this:

if test.foo_type == FooType.ON:
  process_on(test.on)
elif test.foo_type == FooType.COUNT:
  process_count(test.count)
elif test.foo_type == FooType.NAME:
  process_name(test.name)
else:
  raise ValueError('No value for foo specified!')

There are certainly many other ways to make typechecking possible. I really hope you agree with me that such a feature would increase development quality (especially since the rest of your auto generated code is beautifully typecheckable - I like that a lot!).

Thank you in advance!

124C41p avatar Mar 23 '22 12:03 124C41p

I really like the idea and I was actually thinking about this a while back. I'm not a huge fan of making all the fields optional as it could lead to confusion as to why this is different from optional fields and it makes unwrapping fields a pain. Adding a new field and generating members for the enums would be a big compiler change so I'd like to avoid that if possible.

My ideal solution would be to implement oneofs as a context manager, maybe we could try something like.

@dataclass(eq=False,repr=False)
class Test(betterproto.Message):
    with betterproto.oneof() as foo:
        on: bool = betterproto.bool_field(1)
        count: int = betterproto.int32_field(2)
        name: str = betterproto.string_field(3)


match test.foo:
    case OneOf(Test.on, value):
        process_on(value)
    case OneOf(Test.count, value):
        process_count(value)
    case OneOf(Test.name, value):
        process_name(value)
    case _:
        raise ValueError

(This is very Prost inspired) having this type check I think should be possible although I haven't looked into how match works that well as of yet.

Gobot1234 avatar Mar 23 '22 19:03 Gobot1234

Would an union encoding (a.k.a Algebraic Data Type, a.k.a Choice type) be suitable for betterproto?

Foo = bool | int32 | string

match test.foo:
    case bool():
        process_on(v)
    case int32():
        process_count(v)
    case str():
        process_name(v)
    case _:
        assert_never(v)

They have a consistent mathematical background and in Python 3.11 (https://docs.python.org/3.11/library/typing.html#typing.assert_never) are getting full exhaustiveness-check support, and allow a backwards compatible workaround for older Python versions (https://hakibenita.com/python-mypy-exhaustive-checking)

heimalne avatar Jan 10 '23 15:01 heimalne

I think it would be great if we could use (nested) pattern matching instead of which_one_of.

For example, if we have the following messages:

message X {
  int64 n = 1;
}

message Y {
  string s = 1;
}

message A {
  oneof data {
    X x = 1;
    Y y = 2;
  }
}

It would be nice if we could just write something like the following in Python:

match a:
    case A(data=X(n=n)):
        ...
    case A(data=Y(s=s)):
        ...
    ...

a-khabarov avatar May 02 '23 10:05 a-khabarov

What about using a sentinel, like betterproto.MISSING?

syntax = "proto3";

message MyMessage {
  oneof foo {
    bool first = 1;
    int32 second = 2;
    string third = 3;
  }
}
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# sources: mymessage.proto
# plugin: python-betterproto
from dataclasses import dataclass
from typing import Union

import betterproto


@dataclass(eq=False, repr=False)
class MyMessage(betterproto.Message):
    first: Union[bool, betterproto.MISSING] = betterproto.bool_field(1, group="foo")
    second: Union[int, betterproto.MISSING] = betterproto.int32_field(2, group="foo")
    third: Union[str, betterproto.MISSING] = betterproto.string_field(3, group="foo")

In this example, only one of first, second or third will not be betterproto.MISSING, which would be the only one provided. The rest would be betterproto.MISSING.

To use it, one would simply do something like this:

from betterproto import MISSING

from lib import MyMessage


def process(msg: MyMessage) -> None:
    reveal_type(msg.first)  # bool | MISSING
    reveal_type(msg.second)  # int | MISSING
    reveal_type(msg.third)  # str | MISSING
    if msg.first is not MISSING:
        # Process `first`
        reveal_type(msg.first)  # bool
    elif msg.second is not MISSING:
        # Process `second`
        reveal_type(msg.second)  # int
    elif msg.third is not MISSING:
        # Process `third`
        reveal_type(msg.third)  # str

The reveal_type calls are just to show that type-checkers will understand it correctly.

It's possible to define MISSING, as well as to simplify the notation Union[..., betterproto.MISSING] like so:

# Sentinels are a bit of a mess
# https://www.python.org/dev/peps/pep-0484/#support-for-singleton-types-in-unions
# https://www.python.org/dev/peps/pep-0661

from enum import Enum
from typing import Final, TypeVar, Union


class Missing(Enum):
    # https://www.python.org/dev/peps/pep-0484/#support-for-singleton-types-in-unions
    MISSING = None

    def __bool__(self) -> bool:
        # https://docs.python.org/3/library/enum.html#boolean-value-of-enum-classes-and-members
        return bool(self.value)

    def __repr__(self) -> str:
        # https://docs.python.org/3/library/enum.html#omitting-values
        return f"{self.__class__.__name__}.{self.name}"


MISSING: Final = Missing.MISSING
T = TypeVar("T")
Maybe = Union[T, Missing]

Maybe[foo] can then be used instead of Union[foo, MISSING], like so:

# Generated by the protocol buffer compiler.  DO NOT EDIT!
# sources: mymessage.proto
# plugin: python-betterproto
from dataclasses import dataclass

import betterproto


@dataclass(eq=False, repr=False)
class MyMessage(betterproto.Message):
    first: betterproto.Maybe[bool] = betterproto.bool_field(1, group="foo")
    second: betterproto.Maybe[int] = betterproto.int32_field(2, group="foo")
    third: betterproto.Maybe[str] = betterproto.string_field(3, group="foo")

MicaelJarniac avatar May 11 '23 11:05 MicaelJarniac

I think I found a very simple way to implement this that also provides some additional benefits and does not rely on sentinels. Here is a PR: #510

a-khabarov avatar Jul 11 '23 13:07 a-khabarov