xsdata icon indicating copy to clipboard operation
xsdata copied to clipboard

Control characters in data

Open nmrtv opened this issue 9 months ago • 1 comments

Hi,

when string fields in model data contains control characters (e.g., \x02, etc.), these characters are written to the XML output. However, such characters are not valid in XML documents according to the XML 1.0 specification (https://www.w3.org/TR/xml/#charsets). I’m not sure how best to handle such cases, as different users may have different requirements. Perhaps a parameter could be added to SerializerConfig to control the behavior. For example, the default behavior could be to strip these characters, with additional option like "raise" (to throw an error).

nmrtv avatar Mar 07 '25 13:03 nmrtv

Hi @nmrtv, you are using the native XmlEventWriter right? I am not sure why but XMLGenerator, if you switch to the LxmlEventWriter you will get a validation error from lxml

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

Under the hood, we rely on lxml or pythons sax utils for most of these stuff

tefra avatar Mar 10 '25 16:03 tefra