thriftpy2 icon indicating copy to clipboard operation
thriftpy2 copied to clipboard

JSON Protocol does not properly support UNION types

Open hannahpersampieri opened this issue 1 month ago • 1 comments

Description

The JSON protocol serializer (TJSONProtocolFactory) does not properly handle Thrift union types during serialization. When attempting to serialize structs containing union fields, the serializer either fails with errors or produces incorrect output.

Steps to Reproduce

  1. Define a Thrift union and struct that uses it:
enum Color {
  RED = 1,
  BLUE = 2,
  GREEN = 3
}

union Choice {
  1: i32 number;
  2: string text;
  3: Color color;
}

struct Container {
  1: required list<Choice> choices;
}
  1. Deserialize binary data and attempt to serialize to JSON:
from thriftpy2.utils import deserialize, serialize
from thriftpy2.protocol import TBinaryProtocolFactory, TJSONProtocolFactory

deserialized = deserialize(Container(), data, proto_factory=TBinaryProtocolFactory())
serialized = serialize(deserialized, proto_factory=TJSONProtocolFactory())  # Fails or produces incorrect output

Expected Behavior

Unions should be serialized to JSON objects containing only the single field that is set. For example:

{
  "choices": [
    {"number": 42},
    {"text": "hello"},
    {"color": 1}
  ]
}

This matches the behavior of other Thrift implementations, where a union in JSON is represented as an object with exactly one key-value pair.

Actual Behavior

The struct_to_json function in thriftpy2/protocol/json.py treats unions the same as regular structs, attempting to serialize all fields rather than just the one that's set. This can result in:

  • Serialization errors when trying to process union fields
  • Incorrect JSON output with multiple fields when only one should be present
  • TypeError: cannot unpack non-iterable NoneType object in nested structures

Root Cause

The struct_to_json function doesn't distinguish between regular structs and unions. Unions have special semantics where exactly one field is set at a time, but the current implementation doesn't account for this.

Unions can be identified by checking for the __EMPTY__ attribute on the class, but the JSON serializer doesn't currently check for this.

Proposed Fix

Modify struct_to_json in thriftpy2/protocol/json.py to detect and handle unions:

def struct_to_json(obj):
    if obj is None:
        return None
    
    # Check if this is a union (unions have __EMPTY__ attribute)
    is_union = hasattr(obj.__class__, '__EMPTY__')
    
    outobj = {}
    if hasattr(obj, 'thrift_spec') and obj.thrift_spec:
        for field_id, field_spec in obj.thrift_spec.items():
            if field_spec is None:
                continue
            
            field_type = field_spec[0]
            field_name = field_spec[1]
            field_type_spec = field_spec[2] if len(field_spec) > 2 else None
            
            v = getattr(obj, field_name, None)
            
            if is_union:
                # For unions, only serialize the one field that's set
                if v is not None:
                    outobj[field_name] = json_value(field_type, v, field_type_spec)
                    break  # Stop after finding the set field
            else:
                # For regular structs, serialize all non-None fields
                if v is not None:
                    outobj[field_name] = json_value(field_type, v, field_type_spec)
    
    return outobj

Environment

  • thriftpy2 version: 0.5.4
  • Python version: 3.9

hannahpersampieri avatar Oct 30 '25 00:10 hannahpersampieri

Still can't reproduce the error just like #332, this is the code:

import io
import thriftpy2
from thriftpy2.utils import serialize, deserialize
from thriftpy2.protocol import TJSONProtocolFactory


s = """
enum Color {
  RED = 1,
  BLUE = 2,
  GREEN = 3
}

union Choice {
  1: i32 number;
  2: string text;
  3: Color color;
}

struct Container {
  1: required list<Choice> choices;
}
"""
test_thrift = thriftpy2.load_fp(io.StringIO(s), 'test_thrift')

Container = test_thrift.Container
Choice = test_thrift.Choice
Color = test_thrift.Color

origin = Container(choices=[Choice(number=42), Choice(text="hello"), Choice(color=Color.RED)])
serialized = serialize(origin, proto_factory=TJSONProtocolFactory())
print(serialized)
deserialized = Container()
print(deserialize(deserialized, serialized, proto_factory=TJSONProtocolFactory()))

So please ensure that:

1, You're not using a fork of thriftpy2 which have modified the JSON protocol; 2, You're not deserializing a JSON data from other library, otherwise, you should use the thiftpy2.protocol.apache_json.

aisk avatar Nov 03 '25 14:11 aisk