JSON Protocol does not properly support UNION types
Description
The JSON protocol serializer (TJSONProtocolFactory) does not properly handle Thrift union types during serialization. When attempting to serialize structs containing union fields, the serializer either fails with errors or produces incorrect output.
Steps to Reproduce
- Define a Thrift union and struct that uses it:
enum Color {
RED = 1,
BLUE = 2,
GREEN = 3
}
union Choice {
1: i32 number;
2: string text;
3: Color color;
}
struct Container {
1: required list<Choice> choices;
}
- Deserialize binary data and attempt to serialize to JSON:
from thriftpy2.utils import deserialize, serialize
from thriftpy2.protocol import TBinaryProtocolFactory, TJSONProtocolFactory
deserialized = deserialize(Container(), data, proto_factory=TBinaryProtocolFactory())
serialized = serialize(deserialized, proto_factory=TJSONProtocolFactory()) # Fails or produces incorrect output
Expected Behavior
Unions should be serialized to JSON objects containing only the single field that is set. For example:
{
"choices": [
{"number": 42},
{"text": "hello"},
{"color": 1}
]
}
This matches the behavior of other Thrift implementations, where a union in JSON is represented as an object with exactly one key-value pair.
Actual Behavior
The struct_to_json function in thriftpy2/protocol/json.py treats unions the same as regular structs, attempting to serialize all fields rather than just the one that's set. This can result in:
- Serialization errors when trying to process union fields
- Incorrect JSON output with multiple fields when only one should be present
TypeError: cannot unpack non-iterable NoneType objectin nested structures
Root Cause
The struct_to_json function doesn't distinguish between regular structs and unions. Unions have special semantics where exactly one field is set at a time, but the current implementation doesn't account for this.
Unions can be identified by checking for the __EMPTY__ attribute on the class, but the JSON serializer doesn't currently check for this.
Proposed Fix
Modify struct_to_json in thriftpy2/protocol/json.py to detect and handle unions:
def struct_to_json(obj):
if obj is None:
return None
# Check if this is a union (unions have __EMPTY__ attribute)
is_union = hasattr(obj.__class__, '__EMPTY__')
outobj = {}
if hasattr(obj, 'thrift_spec') and obj.thrift_spec:
for field_id, field_spec in obj.thrift_spec.items():
if field_spec is None:
continue
field_type = field_spec[0]
field_name = field_spec[1]
field_type_spec = field_spec[2] if len(field_spec) > 2 else None
v = getattr(obj, field_name, None)
if is_union:
# For unions, only serialize the one field that's set
if v is not None:
outobj[field_name] = json_value(field_type, v, field_type_spec)
break # Stop after finding the set field
else:
# For regular structs, serialize all non-None fields
if v is not None:
outobj[field_name] = json_value(field_type, v, field_type_spec)
return outobj
Environment
- thriftpy2 version: 0.5.4
- Python version: 3.9
Still can't reproduce the error just like #332, this is the code:
import io
import thriftpy2
from thriftpy2.utils import serialize, deserialize
from thriftpy2.protocol import TJSONProtocolFactory
s = """
enum Color {
RED = 1,
BLUE = 2,
GREEN = 3
}
union Choice {
1: i32 number;
2: string text;
3: Color color;
}
struct Container {
1: required list<Choice> choices;
}
"""
test_thrift = thriftpy2.load_fp(io.StringIO(s), 'test_thrift')
Container = test_thrift.Container
Choice = test_thrift.Choice
Color = test_thrift.Color
origin = Container(choices=[Choice(number=42), Choice(text="hello"), Choice(color=Color.RED)])
serialized = serialize(origin, proto_factory=TJSONProtocolFactory())
print(serialized)
deserialized = Container()
print(deserialize(deserialized, serialized, proto_factory=TJSONProtocolFactory()))
So please ensure that:
1, You're not using a fork of thriftpy2 which have modified the JSON protocol;
2, You're not deserializing a JSON data from other library, otherwise, you should use the thiftpy2.protocol.apache_json.