protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

OneOf support

Open acetylen opened this issue 5 years ago • 13 comments

Any plans for this? Syntactically, I think something like this makes sense:

from dataclasses import dataclass
from typing import Union

from pure_protobuf.dataclasses_ import field, message, oneof
from pure_protobuf.types import double, uint32, uint64


@message
@dataclass
class MessageWithOneofs:
    message_data: Union[uint64, double] = oneof()
    message_id: uint32 = field(1)
    message_desc: str = field(2)
    counter: uint64 = field(3, oneof="message_data")
    value: uint64 = field(4, oneof="message_data")

...However, I'm assuming that breaks something in how you parse fields.

acetylen avatar Feb 06 '20 15:02 acetylen

Could you elaborate on your use case? Because oneof doesn't exist on the wire, it's more like a sugar which ensures the mutual exclusiveness. So I want to know more how it's gonna be used to design a syntax

eigenein avatar Feb 06 '20 22:02 eigenein

I'm autogenerating dataclasses from a big spreadsheet. We have fifty or so message types with about a hundred fields each that change regularly while we're developing the product so it's not feasible to write them all manually. Some of our types consist of general metadata preceding a oneof embedded message type, like so:

message Command {
    uint32 parameters = 1;
    ...
}

message Config {
    uint64 timeout = 1;
    ....
}

message DeviceWrapper {
    uint64 message_id = 1;
    bytes public_key = 2;
    oneof Message {
        Command cmd = 3;
        Config cfg = 4;
        ...
   }
}

Then, these messages are encoded and wrapped in another wrapper that goes to our router which checks the recipient and sends the message on without decoding it:

message RouterWrapper {
    int64 timestamp = 1;
    uint64 device_id = 2;
    google.protobuf.Any device_wrapper = 3;
}

Marking fields as oneof and maybe erroring if more than one oneof is set would help greatly in dynamically building dataclasses, and being able to do which_one_of when parsing an incoming message would allow for getting the correct parser for the embedded message.

acetylen avatar Feb 12 '20 09:02 acetylen

Okay, thanks for the clarification!

I have something like this in my mind at the moment:

counter: Optional[uint64] = optional_field(3)
value: Optional[double] = optional_field(4)
message_data: Union[None, uint64, double] = one_of(counter, value)

With a proper implementation of one_of descriptor it shouldn't be that hard to implement it. Think, I'll take it in work

eigenein avatar Feb 13 '20 19:02 eigenein

Thanks for taking it on!

acetylen avatar Feb 14 '20 07:02 acetylen

Okay, thanks for the clarification!

I have something like this in my mind at the moment:

counter: Optional[uint64] = optional_field(3)
value: Optional[double] = optional_field(4)
message_data: Union[None, uint64, double] = one_of(counter, value)

With a proper implementation of one_of descriptor it shouldn't be that hard to implement it. Think, I'll take it in work

Any news about the implementation? It is an extremely useful feature :)

mrMakaronka avatar Jan 22 '22 11:01 mrMakaronka

Sorry guys, I had rough time and didn't really pay an attention to the project, so there's still no any implementation for oneof.

I've become somewhat more active on it recently, but can't commit myself to any ETA at the moment. Pinned it just that I won't forget it.

eigenein avatar Jan 22 '22 23:01 eigenein

Hi @eigenein!

As it was previously mentioned, support of oneof could be highly useful. Don't you mind if I implement it?

Have something like that in my mind:

@message
@dataclass
class Test:
    msg: OneOf[int32, str, bool] = one_of(
        a = optional_field(1),
        b = optional_field(2),
        c = optional_field(3)
    )

// corresponds to 
// message Test {
//    oneof msg {
//        int32 a = 1;
//        string b = 2;
//        bool c = 3;
//    }
//}



t = Test()
t.msg.a = 10
t.msg.b = "20" # overrides field a

# code below will print "20"
if t.msg.which_one_of() == 'a':
    print(t.msg.a)
elif t.msg.which_one_of() == 'b':
    print(t.msg.b)
elif t.msg.which_one_of() == 'c':
    print(t.msg.c)

Seems that it could be a bit cleaner than the implementation proposed earlier.

And a bit easier to implement checks that only one field is assigned at a time, this could even allow to write logic which forbids field overwriting if one is set already.

The only drawback here is that types are declared not close to field declaration and described in OneOf type, but probably it's ok because we can check that they match each other in @message decorator.

What do you think?

ottergottaott avatar Feb 19 '22 19:02 ottergottaott

Hi @aleksZubakov,

Sure, any help is welcome! I've got a couple of concerns here:

  1. The drawback you already mentioned. Could you elaborate on how we could check that? I'm curious how'd we prevent an incorrect ordering inside the OneOf[...]. Say, I mistakenly swap int32 and str in the OneOf[...]` above.
  2. which_one_of() returns a string, which would make it quite difficult to rename fields via refactoring tools or to catch mistakes via linters. I'm not against which_one_of(self) → str per sé, but I'd really like to provide at least one «fail-safe» way and make using of less reliable ways a user's responsibility.

By the way, I see that 2 years ago I had this idea (adjusted to your example) which wouldn't raise these two, but was there maybe a reason/concern/worry not to go with it?

@message
@dataclass
class Test:
    a: int32 = optional_field(1)
    b: str = optional_field(2)
    c: bool = optional_field(3)
    msg: Union[int32, str, bool] = one_of(a, b, c)

// corresponds to 
// message Test {
//    oneof msg {
//        int32 a = 1;
//        string b = 2;
//        bool c = 3;
//    }
// }

t = Test()
t.a = 10
t.b = "20"  # overrides field a

# code below will print "20"
if t.a == 'a':
    print(t.a)
elif t.b == 'b':
    print(t.b)
elif t.c == 'c':
    print(t.c)

eigenein avatar Feb 19 '22 20:02 eigenein

Thanks for such a quick response!

In example you likely meant:

# code below will print "20"
if t.a is not None:
    print(t.a)
elif t.b is not None:
    print(t.b)
elif t.c is not None:
    print(t.c)

Did I get right?

By the way, I see that 2 years ago I had this idea (adjusted to your example) which wouldn't raise these two, but was there maybe a reason/concern/worry not to go with it?

The main concern here is that we cannot control which field set and cannot guarantee that only one field is set at a time. (Well, actually we are talking about python and as we know everything is possible in python, but I honestly tried to prototype such guarantee and it's not that simple without replacing dataclass.field on some tricky descriptor. dataclass.field is actively used and returned by optional_field ).

Example to clarify what I mean:

t = Test()
t.a = 5
t.b = "20"

# both are set
assert (t.a is not None) and (t.b is not None)

We could check it at the latest moment right before serialization, but I think it's way more useful when user understands the place where they are mistaken or even is protected from such mistakes.

The drawback you already mentioned. Could you elaborate on how we could check that? I'm curious how'd we prevent an incorrect ordering inside the OneOf[...]. Say, I mistakenly swap int32 and str in the OneOf[...]` above.

Yeah I do agree on this point, we cannot prevent it.

But probably we should move type into field declaration.

Not sure if it's good, just throwing some ideas

@message
@dataclass
class Test:
    msg: OneOf[int32, str, bool] = one_of(
        a = one_of_field(int32, 1),
        b = one_of_field(str, 2),
        c = one_of_field(bool, 3)
    )

which_one_of() returns a string, which would make it quite difficult to rename fields via refactoring tools or to catch mistakes via linters. I'm not against which_one_of(self) → str per sé, but I'd really like to provide at least one «fail-safe» way and make using of less reliable ways a user's responsibility.

Yeah, I just tried to think of ways to check which particular field is set/deserialized.

As already mentioned the other way could be:

if t.msg.a is not None:
    print(t.a)
...

Does it seem safer?

ottergottaott avatar Feb 20 '22 19:02 ottergottaott

In example you likely meant: … Did I get right?

Oh yeah, overlooked while copy-pasting 😄

The main concern here is that we cannot control which field set and cannot guarantee that only one field is set at a time. (Well, actually we are talking about python and as we know everything is possible in python, but I honestly tried to prototype such guarantee and it's not that simple without replacing dataclass.field on some tricky descriptor. dataclass.field is actively used and returned by optional_field ).

We could check it at the latest moment right before serialization, but I think it's way more useful when user understands the place where they are mistaken or even is protected from such mistakes.

These are both valid points. 🤔

if t.msg.a is not None:
    print(t.a)  # guess you meant `t.msg.a`?

Does it seem safer?

Yep, that looks safer for sure. It's only an implementation of the «failing early» that makes it way trickier. At the serialization level we'd just throw an exception, and at the deserialization – clear the other fields and possibly assign a which_one_of.

Do you already have an idea how the one_of(a, b, c) from your example would be implemented?

eigenein avatar Feb 23 '22 16:02 eigenein

Yea, the example should've been:

if t.msg.a is not None:
    print(t.msg.a) 

Do you already have an idea how the one_of(a, b, c) from your example would be implemented?

After some thought I'm sure it should be an object with custom __getattr__ and __setattr__ -- this would allow to implement "one fieid set at a time" semantics.

I actually even wrote prototype of it, and it seems to work fine:

class OneOf_:
    """
    Defines an oneof field.
    See also: https://developers.google.com/protocol-buffers/docs/proto3#oneof
    """
    def __init__(self, **fields: Field):
        # ugly sets to get round custom setattr
        super().__setattr__('fields', fields)
        super().__setattr__('set_value', None)


    def __getattr__(self, name):
        if name not in self.fields:
            raise AttributeError(f"Field {name} is not found")


        if self.set_value is not None:
            field_name, value = self.set_value
            if field_name == name:
                return value

        return None

    def __setattr__(self, name, value):
        if name not in self.fields:
            raise AttributeError(f"Field {name} is not found")

        super().__setattr__('set_value', (name, value))

And it seems (at least for me and for now) that it's not difficult to implement special OneOfField which inherits Field, because we already have all needed information in fields passed through constructor and know exactly which field is assigned. As a consequence we have all needed Serializers.

UPD: I was wrong when said that we have Serializers if we stick to design suggested by me, but anyway can create them in the same way as all others.

It's only an implementation of the «failing early» that makes it way trickier.

Yes, I do fully agree. But this particular feature looks too good to just pass by and not try to implement it :)

ottergottaott avatar Feb 24 '22 22:02 ottergottaott

Hi @aleksZubakov,

Sorry for the delay 😮‍💨

Yeah, I don't have a better idea of implementing this. Anything I tried to sketch – is more or less of the same difficulty. So If you're feeling like it – please go ahead and I'll join the pull request to complete it

eigenein avatar Mar 01 '22 14:03 eigenein

Thanks!

Here are some implementation and some tests, I'm open for discussion and would be happy if you take a look: https://github.com/eigenein/protobuf/pull/94

ottergottaott avatar Mar 03 '22 22:03 ottergottaott

JFYI 2.1.1 includes the initial OneOf implementation from #94.

eigenein avatar Oct 06 '22 14:10 eigenein

Merged to master, to be released in 3.0.0. Although, 2.1.1 already features the implementation from #94.

eigenein avatar Feb 16 '23 15:02 eigenein