Proposal to make oneof-deconstruction typecheckable
According to the docs, deconstruction of a oneof-type is meant to be done in the following way:
syntax = "proto3";
message Test {
oneof foo {
bool on = 1;
int32 count = 2;
string name = 3;
}
}
foo_type, foo_value = betterproto.which_one_of(test, 'foo')
if foo_type == 'on':
process_on(foo_value)
elif foo_type == 'count':
process_count(foo_value)
elif foo_type == 'name':
process_name(foo_value)
else:
raise ValueError('No value for foo specified!')
Unfortunately, in that way development tooling (pylance, mypy, etc) can neither check for typos nor for completeness. (Imagine to introduce a new field string address = 4; or to remove on of the other fields later. A type checker could not see any problem with our python code after executing protoc.)
There are different possible ways to solve that issue.
Variant 1 - Make all sub-fields of foo optional
You could change the implementation of the Test dataclass such that all sub fields of foo are optional, and the unset fields are set to None. Then our deconstruction could look like this:
if test.on is not None:
process_on(test.on)
elif test.count is not None:
process_count(test.count)
elif test.name is not None:
process_name(test.name)
else:
raise ValueError('No value for foo specified!')
Variant 2 - Introduce an enum specifying the field set
You could add an enum field to the Test dataclass which encodes the actual field set. Then our deconstruction could look like this:
if test.foo_type == FooType.ON:
process_on(test.on)
elif test.foo_type == FooType.COUNT:
process_count(test.count)
elif test.foo_type == FooType.NAME:
process_name(test.name)
else:
raise ValueError('No value for foo specified!')
There are certainly many other ways to make typechecking possible. I really hope you agree with me that such a feature would increase development quality (especially since the rest of your auto generated code is beautifully typecheckable - I like that a lot!).
Thank you in advance!
I really like the idea and I was actually thinking about this a while back. I'm not a huge fan of making all the fields optional as it could lead to confusion as to why this is different from optional fields and it makes unwrapping fields a pain. Adding a new field and generating members for the enums would be a big compiler change so I'd like to avoid that if possible.
My ideal solution would be to implement oneofs as a context manager, maybe we could try something like.
@dataclass(eq=False,repr=False)
class Test(betterproto.Message):
with betterproto.oneof() as foo:
on: bool = betterproto.bool_field(1)
count: int = betterproto.int32_field(2)
name: str = betterproto.string_field(3)
match test.foo:
case OneOf(Test.on, value):
process_on(value)
case OneOf(Test.count, value):
process_count(value)
case OneOf(Test.name, value):
process_name(value)
case _:
raise ValueError
(This is very Prost inspired) having this type check I think should be possible although I haven't looked into how match works that well as of yet.
Would an union encoding (a.k.a Algebraic Data Type, a.k.a Choice type) be suitable for betterproto?
Foo = bool | int32 | string
match test.foo:
case bool():
process_on(v)
case int32():
process_count(v)
case str():
process_name(v)
case _:
assert_never(v)
They have a consistent mathematical background and in Python 3.11 (https://docs.python.org/3.11/library/typing.html#typing.assert_never) are getting full exhaustiveness-check support, and allow a backwards compatible workaround for older Python versions (https://hakibenita.com/python-mypy-exhaustive-checking)
I think it would be great if we could use (nested) pattern matching instead of which_one_of.
For example, if we have the following messages:
message X {
int64 n = 1;
}
message Y {
string s = 1;
}
message A {
oneof data {
X x = 1;
Y y = 2;
}
}
It would be nice if we could just write something like the following in Python:
match a:
case A(data=X(n=n)):
...
case A(data=Y(s=s)):
...
...
What about using a sentinel, like betterproto.MISSING?
syntax = "proto3";
message MyMessage {
oneof foo {
bool first = 1;
int32 second = 2;
string third = 3;
}
}
# Generated by the protocol buffer compiler. DO NOT EDIT!
# sources: mymessage.proto
# plugin: python-betterproto
from dataclasses import dataclass
from typing import Union
import betterproto
@dataclass(eq=False, repr=False)
class MyMessage(betterproto.Message):
first: Union[bool, betterproto.MISSING] = betterproto.bool_field(1, group="foo")
second: Union[int, betterproto.MISSING] = betterproto.int32_field(2, group="foo")
third: Union[str, betterproto.MISSING] = betterproto.string_field(3, group="foo")
In this example, only one of first, second or third will not be betterproto.MISSING, which would be the only one provided. The rest would be betterproto.MISSING.
To use it, one would simply do something like this:
from betterproto import MISSING
from lib import MyMessage
def process(msg: MyMessage) -> None:
reveal_type(msg.first) # bool | MISSING
reveal_type(msg.second) # int | MISSING
reveal_type(msg.third) # str | MISSING
if msg.first is not MISSING:
# Process `first`
reveal_type(msg.first) # bool
elif msg.second is not MISSING:
# Process `second`
reveal_type(msg.second) # int
elif msg.third is not MISSING:
# Process `third`
reveal_type(msg.third) # str
The reveal_type calls are just to show that type-checkers will understand it correctly.
It's possible to define MISSING, as well as to simplify the notation Union[..., betterproto.MISSING] like so:
# Sentinels are a bit of a mess
# https://www.python.org/dev/peps/pep-0484/#support-for-singleton-types-in-unions
# https://www.python.org/dev/peps/pep-0661
from enum import Enum
from typing import Final, TypeVar, Union
class Missing(Enum):
# https://www.python.org/dev/peps/pep-0484/#support-for-singleton-types-in-unions
MISSING = None
def __bool__(self) -> bool:
# https://docs.python.org/3/library/enum.html#boolean-value-of-enum-classes-and-members
return bool(self.value)
def __repr__(self) -> str:
# https://docs.python.org/3/library/enum.html#omitting-values
return f"{self.__class__.__name__}.{self.name}"
MISSING: Final = Missing.MISSING
T = TypeVar("T")
Maybe = Union[T, Missing]
Maybe[foo] can then be used instead of Union[foo, MISSING], like so:
# Generated by the protocol buffer compiler. DO NOT EDIT!
# sources: mymessage.proto
# plugin: python-betterproto
from dataclasses import dataclass
import betterproto
@dataclass(eq=False, repr=False)
class MyMessage(betterproto.Message):
first: betterproto.Maybe[bool] = betterproto.bool_field(1, group="foo")
second: betterproto.Maybe[int] = betterproto.int32_field(2, group="foo")
third: betterproto.Maybe[str] = betterproto.string_field(3, group="foo")
I think I found a very simple way to implement this that also provides some additional benefits and does not rely on sentinels. Here is a PR: #510