xsdata icon indicating copy to clipboard operation
xsdata copied to clipboard

Inconsistent serialisation of simple classes

Open johnduffield opened this issue 8 months ago • 2 comments

I have an odd situation where a simple class containing just a string and a list of strings is serialised differently to other very similar classes.
The first string is not treated as an element unless I explicitly declare it, but only for that one case. Notice how MyClass2 is serialised differently to the others in the following example:

from dataclasses import dataclass, field

from xsdata.formats.dataclass.serializers.config import SerializerConfig
from xsdata.formats.dataclass.serializers import XmlSerializer

@dataclass
class MyClass1:
    first_string: str
    second_string: str        

@dataclass
class MyClass2:
    first_string: str
    list_of_stuff: list[str] = field(        
        metadata={"wrapper": "stuff_list", "name": "stuff", "type": "Element"}
    )                  

@dataclass
class MyClass3:
    first_string: str
    list_of_stuff: list[str] = field(        
        metadata={"wrapper": "stuff_list", "name": "stuff", "type": "Element"}
    )                  
    another_string: str

@dataclass
class MyClass4:
    first_string: str = field(metadata={"type": "Element"})
    list_of_stuff: list[str] = field(        
        metadata={"wrapper": "stuff_list", "name": "stuff", "type": "Element"}
    )                  

test1 = MyClass1("simple class with two strings","second string")                 
test2 = MyClass2("replacing the second string with a list makes the 'first_string' tag disappear", 
                       ["first string in list", "second string in list"])
test3 = MyClass3("Adding another string to the class fixes it", 
                       ["first string in list", "second string in list"], 
                       "another")
test4 = MyClass4("Explicitly declaring 'first_string' as an Element also works", 
                       ["first string in list", "second string in list"])

print(XmlSerializer(config=SerializerConfig(pretty_print=True)).render(test1))
print(XmlSerializer(config=SerializerConfig(pretty_print=True)).render(test2))
print(XmlSerializer(config=SerializerConfig(pretty_print=True)).render(test3))
print(XmlSerializer(config=SerializerConfig(pretty_print=True)).render(test4))

Output:

<?xml version="1.0" encoding="UTF-8"?>
<MyClass1>
  <first_string>simple class with two strings</first_string>
  <second_string>second string</second_string>
</MyClass1>

<?xml version="1.0" encoding="UTF-8"?>
<MyClass2>replacing the second string with a list makes the 'first_string' tag disappear<stuff_list>
    <stuff>first string in list</stuff>
    <stuff>second string in list</stuff>
  </stuff_list>
</MyClass2>

<?xml version="1.0" encoding="UTF-8"?>
<MyClass3>
  <first_string>Adding another string to the class fixes it</first_string>
  <stuff_list>
    <stuff>first string in list</stuff>
    <stuff>second string in list</stuff>
  </stuff_list>
  <another_string>another</another_string>
</MyClass3>

<?xml version="1.0" encoding="UTF-8"?>
<MyClass4>
  <first_string>Explicitly declaring 'first_string' as an Element also works</first_string>
  <stuff_list>
    <stuff>first string in list</stuff>
    <stuff>second string in list</stuff>
  </stuff_list>
</MyClass4>

johnduffield avatar May 02 '25 10:05 johnduffield

After more investigation, it looks like this is an issue when classes have metadata on some, but not all members:

from dataclasses import dataclass, field

from xsdata.formats.dataclass.serializers.config import SerializerConfig
from xsdata.formats.dataclass.serializers import XmlSerializer

@dataclass
class MyClass1:
    first_string: str
    second_string: str        

@dataclass
class MyClass2:
    first_string: str
    second_string: str = field(metadata={"type": "Element"})        

@dataclass
class MyClass3:
    first_string: str = field(metadata={"type": "Element"})
    second_string: str         

test1 = MyClass1("simple class with two strings","second string")                 
test2 = MyClass2("Adding metadata to the second string makes the 'first_string' tag disappear", 
                "second string")

test3 = MyClass3("first string",
                 "Adding metadata to the first string makes the 'second_string' tag disappear",)

print(XmlSerializer(config=SerializerConfig(pretty_print=True)).render(test1))
print(XmlSerializer(config=SerializerConfig(pretty_print=True)).render(test2))
print(XmlSerializer(config=SerializerConfig(pretty_print=True)).render(test3))

Output:

<?xml version="1.0" encoding="UTF-8"?>
<MyClass1>
  <first_string>simple class with two strings</first_string>
  <second_string>second string</second_string>
</MyClass1>

<?xml version="1.0" encoding="UTF-8"?>
<MyClass2>Adding metadata to the second string makes the 'first_string' tag disappear<second_string>second string</second_string>
</MyClass2>

<?xml version="1.0" encoding="UTF-8"?>
<MyClass3>
  <first_string>first string</first_string>Adding metadata to the first string makes the 'second_string' tag disappear</MyClass3>

johnduffield avatar May 02 '25 11:05 johnduffield

Since some fields don't have the metadata type, the library is trying to decide between an Element or a simple Text Node.

When there is only one undefined one, it defaults to Text Node, when there are more than one undefined it assumes they are Elements.

src

tefra avatar May 02 '25 12:05 tefra