protobuf
protobuf copied to clipboard
OneOf support
Any plans for this? Syntactically, I think something like this makes sense:
from dataclasses import dataclass
from typing import Union
from pure_protobuf.dataclasses_ import field, message, oneof
from pure_protobuf.types import double, uint32, uint64
@message
@dataclass
class MessageWithOneofs:
message_data: Union[uint64, double] = oneof()
message_id: uint32 = field(1)
message_desc: str = field(2)
counter: uint64 = field(3, oneof="message_data")
value: uint64 = field(4, oneof="message_data")
...However, I'm assuming that breaks something in how you parse fields.
Could you elaborate on your use case? Because oneof
doesn't exist on the wire, it's more like a sugar which ensures the mutual exclusiveness. So I want to know more how it's gonna be used to design a syntax
I'm autogenerating dataclasses from a big spreadsheet. We have fifty or so message types with about a hundred fields each that change regularly while we're developing the product so it's not feasible to write them all manually. Some of our types consist of general metadata preceding a oneof embedded message type, like so:
message Command {
uint32 parameters = 1;
...
}
message Config {
uint64 timeout = 1;
....
}
message DeviceWrapper {
uint64 message_id = 1;
bytes public_key = 2;
oneof Message {
Command cmd = 3;
Config cfg = 4;
...
}
}
Then, these messages are encoded and wrapped in another wrapper that goes to our router which checks the recipient and sends the message on without decoding it:
message RouterWrapper {
int64 timestamp = 1;
uint64 device_id = 2;
google.protobuf.Any device_wrapper = 3;
}
Marking fields as oneof and maybe erroring if more than one oneof is set would help greatly in dynamically building dataclasses, and being able to do which_one_of
when parsing an incoming message would allow for getting the correct parser for the embedded message.
Okay, thanks for the clarification!
I have something like this in my mind at the moment:
counter: Optional[uint64] = optional_field(3)
value: Optional[double] = optional_field(4)
message_data: Union[None, uint64, double] = one_of(counter, value)
With a proper implementation of one_of
descriptor it shouldn't be that hard to implement it. Think, I'll take it in work
Thanks for taking it on!
Okay, thanks for the clarification!
I have something like this in my mind at the moment:
counter: Optional[uint64] = optional_field(3) value: Optional[double] = optional_field(4) message_data: Union[None, uint64, double] = one_of(counter, value)
With a proper implementation of
one_of
descriptor it shouldn't be that hard to implement it. Think, I'll take it in work
Any news about the implementation? It is an extremely useful feature :)
Sorry guys, I had rough time and didn't really pay an attention to the project, so there's still no any implementation for oneof
.
I've become somewhat more active on it recently, but can't commit myself to any ETA at the moment. Pinned it just that I won't forget it.
Hi @eigenein!
As it was previously mentioned, support of oneof could be highly useful. Don't you mind if I implement it?
Have something like that in my mind:
@message
@dataclass
class Test:
msg: OneOf[int32, str, bool] = one_of(
a = optional_field(1),
b = optional_field(2),
c = optional_field(3)
)
// corresponds to
// message Test {
// oneof msg {
// int32 a = 1;
// string b = 2;
// bool c = 3;
// }
//}
t = Test()
t.msg.a = 10
t.msg.b = "20" # overrides field a
# code below will print "20"
if t.msg.which_one_of() == 'a':
print(t.msg.a)
elif t.msg.which_one_of() == 'b':
print(t.msg.b)
elif t.msg.which_one_of() == 'c':
print(t.msg.c)
Seems that it could be a bit cleaner than the implementation proposed earlier.
And a bit easier to implement checks that only one field is assigned at a time, this could even allow to write logic which forbids field overwriting if one is set already.
The only drawback here is that types are declared not close to field declaration and described in OneOf
type, but probably it's ok because we can check that they match each other in @message
decorator.
What do you think?
Hi @aleksZubakov,
Sure, any help is welcome! I've got a couple of concerns here:
- The drawback you already mentioned. Could you elaborate on how we could check that? I'm curious how'd we prevent an incorrect ordering inside the
OneOf[...]
. Say, I mistakenly swapint32
andstr
in theOneOf
[...]` above. -
which_one_of()
returns a string, which would make it quite difficult to rename fields via refactoring tools or to catch mistakes via linters. I'm not againstwhich_one_of(self) → str
per sé, but I'd really like to provide at least one «fail-safe» way and make using of less reliable ways a user's responsibility.
By the way, I see that 2 years ago I had this idea (adjusted to your example) which wouldn't raise these two, but was there maybe a reason/concern/worry not to go with it?
@message
@dataclass
class Test:
a: int32 = optional_field(1)
b: str = optional_field(2)
c: bool = optional_field(3)
msg: Union[int32, str, bool] = one_of(a, b, c)
// corresponds to
// message Test {
// oneof msg {
// int32 a = 1;
// string b = 2;
// bool c = 3;
// }
// }
t = Test()
t.a = 10
t.b = "20" # overrides field a
# code below will print "20"
if t.a == 'a':
print(t.a)
elif t.b == 'b':
print(t.b)
elif t.c == 'c':
print(t.c)
Thanks for such a quick response!
In example you likely meant:
# code below will print "20"
if t.a is not None:
print(t.a)
elif t.b is not None:
print(t.b)
elif t.c is not None:
print(t.c)
Did I get right?
By the way, I see that 2 years ago I had this idea (adjusted to your example) which wouldn't raise these two, but was there maybe a reason/concern/worry not to go with it?
The main concern here is that we cannot control which field set and cannot guarantee that only one field is set at a time. (Well, actually we are talking about python and as we know everything is possible in python, but I honestly tried to prototype such guarantee and it's not that simple without replacing dataclass.field
on some tricky descriptor. dataclass.field
is actively used and returned by optional_field
).
Example to clarify what I mean:
t = Test()
t.a = 5
t.b = "20"
# both are set
assert (t.a is not None) and (t.b is not None)
We could check it at the latest moment right before serialization, but I think it's way more useful when user understands the place where they are mistaken or even is protected from such mistakes.
The drawback you already mentioned. Could you elaborate on how we could check that? I'm curious how'd we prevent an incorrect ordering inside the OneOf[...]. Say, I mistakenly swap int32 and str in the OneOf[...]` above.
Yeah I do agree on this point, we cannot prevent it.
But probably we should move type into field declaration.
Not sure if it's good, just throwing some ideas
@message
@dataclass
class Test:
msg: OneOf[int32, str, bool] = one_of(
a = one_of_field(int32, 1),
b = one_of_field(str, 2),
c = one_of_field(bool, 3)
)
which_one_of() returns a string, which would make it quite difficult to rename fields via refactoring tools or to catch mistakes via linters. I'm not against which_one_of(self) → str per sé, but I'd really like to provide at least one «fail-safe» way and make using of less reliable ways a user's responsibility.
Yeah, I just tried to think of ways to check which particular field is set/deserialized.
As already mentioned the other way could be:
if t.msg.a is not None:
print(t.a)
...
Does it seem safer?
In example you likely meant: … Did I get right?
Oh yeah, overlooked while copy-pasting 😄
The main concern here is that we cannot control which field set and cannot guarantee that only one field is set at a time. (Well, actually we are talking about python and as we know everything is possible in python, but I honestly tried to prototype such guarantee and it's not that simple without replacing
dataclass.field
on some tricky descriptor.dataclass.field
is actively used and returned byoptional_field
).We could check it at the latest moment right before serialization, but I think it's way more useful when user understands the place where they are mistaken or even is protected from such mistakes.
These are both valid points. 🤔
if t.msg.a is not None: print(t.a) # guess you meant `t.msg.a`?
Does it seem safer?
Yep, that looks safer for sure. It's only an implementation of the «failing early» that makes it way trickier. At the serialization level we'd just throw an exception, and at the deserialization – clear the other fields and possibly assign a which_one_of
.
Do you already have an idea how the one_of(a, b, c)
from your example would be implemented?
Yea, the example should've been:
if t.msg.a is not None:
print(t.msg.a)
Do you already have an idea how the one_of(a, b, c) from your example would be implemented?
After some thought I'm sure it should be an object with custom __getattr__
and __setattr__
-- this would allow to implement "one fieid set at a time" semantics.
I actually even wrote prototype of it, and it seems to work fine:
class OneOf_:
"""
Defines an oneof field.
See also: https://developers.google.com/protocol-buffers/docs/proto3#oneof
"""
def __init__(self, **fields: Field):
# ugly sets to get round custom setattr
super().__setattr__('fields', fields)
super().__setattr__('set_value', None)
def __getattr__(self, name):
if name not in self.fields:
raise AttributeError(f"Field {name} is not found")
if self.set_value is not None:
field_name, value = self.set_value
if field_name == name:
return value
return None
def __setattr__(self, name, value):
if name not in self.fields:
raise AttributeError(f"Field {name} is not found")
super().__setattr__('set_value', (name, value))
And it seems (at least for me and for now) that it's not difficult to implement special OneOfField
which inherits Field
, because we already have all needed information in fields passed through constructor and know exactly which field is assigned. As a consequence we have all needed Serializers.
UPD: I was wrong when said that we have Serializers
if we stick to design suggested by me, but anyway can create them in the same way as all others.
It's only an implementation of the «failing early» that makes it way trickier.
Yes, I do fully agree. But this particular feature looks too good to just pass by and not try to implement it :)
Hi @aleksZubakov,
Sorry for the delay 😮💨
Yeah, I don't have a better idea of implementing this. Anything I tried to sketch – is more or less of the same difficulty. So If you're feeling like it – please go ahead and I'll join the pull request to complete it
Thanks!
Here are some implementation and some tests, I'm open for discussion and would be happy if you take a look: https://github.com/eigenein/protobuf/pull/94
JFYI 2.1.1
includes the initial OneOf
implementation from #94.
Merged to master
, to be released in 3.0.0
. Although, 2.1.1
already features the implementation from #94.