python-betterproto icon indicating copy to clipboard operation
python-betterproto copied to clipboard

to_dict() method converts integer field values to strings

Open ahobsonsayers opened this issue 5 years ago • 10 comments
trafficstars

When using the to_dict method of a message, integer fields are converted to strings. This can be see on this line in the source code.

Why are integers converted to strings, while all other types are left as is? Is this a bug that i can open a PR for, or it deliberate? Any help is much appreciated as this seems like a bit odd behaviour

ahobsonsayers avatar May 07 '20 12:05 ahobsonsayers

That seems odd, any insight? @danielgtaylor

nat-n avatar May 21 '20 16:05 nat-n

For 64 bits int, spec requires it to be a string.

There was a discussion on protobuf repo.

But I do not quite understand the list handling of int64 types. @danielgtaylor

jameslan avatar Jun 12 '20 04:06 jameslan

But I do not quite understand the list handling of int64 types

As far as I can see, the list implementation is equivalent to the single int64 type. It just converts all ints in the list to a string. Or is something else unclear about it?

boukeversteegh avatar Jun 14 '20 10:06 boukeversteegh

Oh that's for repeated. I forgot about it. Then there's no question about the code.

jameslan avatar Jun 14 '20 17:06 jameslan

One thing I know we found confusing when looking into betterproto is that to_dict converts everything to the JSON representation, but that isn't really clearly documented. It makes sense with how to_json just json.dumps the output of to_dict, since without this conversion you'd have no way to ensure it matches the spec, but it's confusing when you're expected to_dict to return pythonic values like datetime or in this case an int and leads this behavior being interpreted as a bug

Maybe adding a separate to_python that would fairly simply convert to pythonic types, and then possibly rename to_dict to to_json_dict or something of the like would be a good feature request to clear up this confusion?

samschlegel avatar Jun 15 '20 23:06 samschlegel

One thing I know we found confusing when looking into betterproto is that to_dict converts everything to the JSON representation, but that isn't really clearly documented. It makes sense with how to_json just json.dumps the output of to_dict, since without this conversion you'd have no way to ensure it matches the spec, but it's confusing when you're expected to_dict to return pythonic values like datetime or in this case an int and leads this behavior being interpreted as a bug

That makes a lot of sense!

I'm not entirely sure what the use-case is of to_dict, other than as a stepping stone for json output, but perhaps people use it for their own serialization methods? any insight into this @ahobsonsayers?

depending on how users use to_dict, it could either be useful to convert everything to low-level values, or actually keep it pythonic.

  • to_dict is a bit unclear with respect to its contract

  • to_json_dict is an isolated case for which we can define a clear contract (serializes object to dictionary directly convertible to a JSON string). It would be a breaking change but I'd suggest to introduce this field, and then see how we re-interpret to_dict

boukeversteegh avatar Jun 16 '20 08:06 boukeversteegh

@boukeversteegh One use case is to convert protobuf messages into domain objects with an object mapper. Most object mappers take dictionaries as arguments.

jhowarth avatar Jun 16 '20 22:06 jhowarth

Any updates for this? Currently on 2.0.0b2.

My particular use case imports YAML, validates each property based on the dataclass field types, does some processing based on the field name/type, and passes the output of to_dict(casing=...) to a third-party module. Right now I have to perform a post-processing step on the output of to_dict() by traversing the tree and converting specific values but I'd rather not have to maintain that code.

v1b1 avatar Dec 29 '20 17:12 v1b1

Would also +1 this issue. There's a related issue on how integers should be parsed as keys for map types in from_dict() - currently all treated as strings #235

jonmather avatar May 26 '21 21:05 jonmather

I know it was mentioned that the spec calls this out, but that either isn't true or was changed. The spec clear says that protobuf int64 values should be int/long in Python.

[4] 64-bit or unsigned 32-bit integers are always represented as long when decoded, but can be an int if an int is given when setting the field. In all cases, the value must fit in the type represented when set. See [2].

mbrancato avatar Feb 24 '24 22:02 mbrancato