protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Expose container types in public API

Open stinovlas opened this issue 2 years ago • 1 comments

What language does this apply to? Python

Describe the problem you are trying to solve. We do have a library that automatically decodes some common protobuf types to native Python structures. E.g. ScalarMap is decoded to dict. The problem is that with arrival of protobuf==4.*, protobuf no longer creates instance of google.protobuf.internal.containers.ScalarMap, but instance of google._upb._message.ScalarMapContainer instead. I don't particularly like importing this class from something that looks like a private module that may break at any time. Also, I don't really understand why protobuf has two classes that are very similar, but one is not inherited from the other. It's confusing.

Describe the solution you'd like I'd like you to publish MessageMap, RepeatedCompositeFieldContainer, RepeatedScalarFieldContainer, ScalarMap as a public API that will only break in major versions. Of course, protobuf should create instances of these public classes so we can test them with isinstance call.

Describe alternatives you've considered I can conditionally import from private module. This may be quite fragile.

If you do have any other suggestion how I might recognize ScalarMap (and other types mentioned above) instead of testing them with isinstance and without the need to import from private module, I'd appreciate that.

stinovlas avatar Dec 05 '22 12:12 stinovlas

All of the internal implementation classes inherit from MutableMapping, for example: https://github.com/protocolbuffers/protobuf/blob/9e0e7d5f59ecd2501ae638eeb1b65ccf361f008a/python/google/protobuf/internal/containers.py#L366

The public API for testing isinstance() would be isinstance(container, MutableMapping).

Will that work for you?

haberman avatar Dec 08 '22 22:12 haberman

Hi @haberman, thanks for the reply. I didn't get to this in a while. I'm afraid this works in Python protobuf 3.*, but not in 4.*. Example:

# proto/example.proto
message DictMessage
{
    map<string, string> values = 1;
}
message ListMessage
{
    repeated string values = 1;
}
# test.py
from proto.example_pb2 import DictMessage, ListMessage

d = DictMessage()
print(type(d.values).__mro__)
# outputs: (<class 'google._upb._message.ScalarMapContainer'>, <class 'collections.abc.MutableMapping'>, <class 'collections.abc.Mapping'>, <class 'collections.abc.Collection'>, <class 'collections.abc.Sized'>, <class 'collections.abc.Iterable'>, <class 'collections.abc.Container'>, <class 'object'>)

l = ListMessage()
print(type(l.values).__mro__)
# outputs: (<class 'google._upb._message.RepeatedScalarContainer'>, <class 'object'>)

So, the solution works for ScalarMapContainer which is instance of MutableMapping, but not for RepeatedScalarContainer, which is not instance of MutableSequence.

stinovlas avatar Jan 26 '23 16:01 stinovlas

We have unit tests that verify that our sequences are instances of MutableSequence: https://github.com/protocolbuffers/protobuf/blob/b9eb6bf759cf6edf449291215bd4ee2c572845d2/python/google/protobuf/internal/message_test.py#L706-L710

I tried your example and got similar results, however when I tried printing isinstance(l.values, MutableSequence) it returned true.

So perhaps this is not reflected in __mro__ for some reason, but isinstance() is behaving as expected.

haberman avatar Jan 26 '23 17:01 haberman

Ahh, that's interesting. isinstance(l.values, MutableSequence) returns True although MutableSequence is not listed in mro. Now I'm confused. Why does MutableSequence not show up in mro?

I actually do have gRPC decoder which needs to recognize these types and uses mro to discover decoding function for type that is closest to the type of object submitted to decoder. isinstance won't tell me anything about the distance. I guess I could make some kind of fallback for this, but it would be kind of ugly.

stinovlas avatar Jan 27 '23 07:01 stinovlas

It looks like we call MutableSequence.register(RepeatedScalarContainer) instead of deriving from MutableSequence: https://github.com/protocolbuffers/upb/blob/main/python/repeated.c#L807-L810

This is documented here: https://docs.python.org/3/library/collections.abc.html#module-collections.abc

haberman avatar Jan 27 '23 19:01 haberman