xsdata icon indicating copy to clipboard operation
xsdata copied to clipboard

XML parsing approximately 200 times slower than parsing to in-built xml.etree.ElementTree.Element

Open DareDevilDenis opened this issue 8 months ago • 4 comments

Using:

  • xsdata 24.6
  • Python 3.12.3

I'd like to ask about the performance of xsdata XML parsing. In my benchmarking I found it to be approximately 200 times slower than parsing to the in-built xml.etree.ElementTree.Element. I was expecting xsdata to be a little slower but this difference seems to be extreme. I tried both XmlEventHandler and LxmlEventHandler and got similar results.

Is this difference expected? If it's expected then I apologise for raising this as an issue.

Test script:

from pathlib import Path
import time
import xml.etree.ElementTree
import my_dataclass
from xsdata.formats.dataclass.parsers import XmlParser
from xsdata.formats.dataclass.parsers.handlers import XmlEventHandler

TEST_ITERATIONS = 2000
my_path = Path(__file__).parent
xml_file = my_path / "input.xml"

def main():
    with xml_file.open() as f:
        file_contents = f.read()

    start_time = time.time()
    using_in_built_element(file_contents, TEST_ITERATIONS)
    end_time = time.time()
    time_using_in_built_element = end_time - start_time
    print("Time using Python xml.etree.ElementTree.Element:", time_using_in_built_element)

    start_time = time.time()
    using_xsdata(file_contents, TEST_ITERATIONS)
    end_time = time.time()
    time_using_xsdata = end_time - start_time
    print("Time using xsdata:", time_using_xsdata)
    print ("Ratio:", time_using_xsdata / time_using_in_built_element)


def using_in_built_element(xml_string, iterations):
    for _ in range(iterations):
        xml_root = xml.etree.ElementTree.fromstring(xml_string)

def using_xsdata(xml_string, iterations):
    parser = XmlParser(handler=XmlEventHandler)
    for _ in range(iterations):
        record_as_obj = parser.from_string(xml_string, my_dataclass.LogRecord)

if __name__ == "__main__":
    main()

My results:

Time using Python xml.etree.ElementTree.Element: 0.2820000648498535
Time using xsdata: 55.9834668636322
Ratio: 198.52288648741876

I've attached this script, "input.xml" and "my_dataclass.py": xsdata_xml_parse_performance.zip

DareDevilDenis avatar Jun 24 '24 16:06 DareDevilDenis