xsdata icon indicating copy to clipboard operation
xsdata copied to clipboard

Is it possible to parse only necessary XML nodes?

Open Irbiss555 opened this issue 2 years ago • 3 comments

I have recently started using XSData and I faced a problem.

I have to parse a huge XML data (almost 400 generated dataclasses), but I need to get only some of them. For example, let's say I have XML

<?xml version="1.0"?>
<PurchaseOrder PurchaseOrderNumber="99503">
  <Address Type="Shipping">
    <Name>Ellen Adams</Name>
    <Street>123 Maple Street</Street>
    <City>Mill Valley</City>
    <State>CA</State>
    <Zip>10999</Zip>
    <Country>USA</Country>
  </Address>
  <DeliveryNotes>Please leave packages in shed by driveway.</DeliveryNotes>
  <Items>
    <Item PartNumber="872-AA">
      <ProductName>Lawnmower</ProductName>
      <Quantity>1</Quantity>
      <USPrice>148.95</USPrice>
      <Comment>Confirm this is electric</Comment>
    </Item>
    <Item PartNumber="926-AA">
      <ProductName>Baby Monitor</ProductName>
      <Quantity>2</Quantity>
      <USPrice>39.98</USPrice>
      <ShipDate>1999-05-21</ShipDate>
    </Item>
  </Items>
</PurchaseOrder>

And generated dataclasses

@dataclass
class Address:
    type_value: Optional[str] = field( default=None, metadata={ "name": "Type", "type": "Attribute", "required": True, } )
    name: Optional[str] = field( default=None, metadata={ "name": "Name", "type": "Element", "required": True, } )
    street: Optional[str] = field( default=None, metadata={ "name": "Street", "type": "Element", "required": True, } )
    city: Optional[str] = field( default=None, metadata={ "name": "City", "type": "Element", "required": True, } )
    state: Optional[str] = field( default=None, metadata={ "name": "State", "type": "Element", "required": True, } )
    zip: Optional[int] = field( default=None, metadata={ "name": "Zip", "type": "Element", "required": True, } )
    country: Optional[str] = field( default=None, metadata={ "name": "Country", "type": "Element", "required": True, } )


@dataclass
class Item:
    part_number: Optional[str] = field( default=None, metadata={ "name": "PartNumber", "type": "Attribute", "required": True, } )
    product_name: Optional[str] = field( default=None, metadata={ "name": "ProductName", "type": "Element", "required": True, } )
    quantity: Optional[int] = field( default=None, metadata={ "name": "Quantity", "type": "Element", "required": True, } )
    usprice: Optional[float] = field( default=None, metadata={ "name": "USPrice", "type": "Element", "required": True, } )
    ship_date: Optional[XmlDate] = field( default=None, metadata={ "name": "ShipDate", "type": "Element", } )
    comment: Optional[str] = field( default=None, metadata={ "name": "Comment", "type": "Element", } )


@dataclass
class Items:
    item: List[Item] = field( default_factory=list, metadata={ "name": "Item", "type": "Element", "min_occurs": 1, } )


@dataclass
class PurchaseOrder:
    purchase_order_number: Optional[int] = field( default=None, metadata={ "name": "PurchaseOrderNumber", "type": "Attribute", "required": True, } )
    address: Optional[Address] = field( default=None, metadata={ "name": "Address", "type": "Element", "required": True, } )
    delivery_notes: Optional[str] = field( default=None, metadata={ "name": "DeliveryNotes", "type": "Element", "required": True, } )
    items: Optional[Items] = field( default=None, metadata={ "name": "Items", "type": "Element", "required": True, } )

If I parse like this

from generated.purchase_order import PurchaseOrder
from xsdata.formats.dataclass.parsers import XmlParser


parser = XmlParser()

purchase_order = parser.parse('xml_file', clazz=PurchaseOrder)

It will parse all xml nodes.

How to make XmlParser to parse only the node <DeliveryNotes>Please leave packages in shed by driveway.</DeliveryNotes>? This is important to me because I have to parse thousands of big XML files

Irbiss555 avatar Nov 14 '23 09:11 Irbiss555

It's not impossible, I don't have an example, but you would have to roll out your own xml handler to ignore all other elements

tefra avatar Nov 19 '23 16:11 tefra

Or use etree manually, to pick the elements you want and then pass them to xsdata XmlParser

Something like this

    tree = lxml.etree.parse("/tmp/Flix_Line_x400.xml")
    for element in tree.iterfind(".//{http://www.netex.org.uk/netex}ServiceJourney"):
        service_journey = parser.parse(element, ServiceJourney)

tefra avatar Nov 19 '23 16:11 tefra

@tefra thank you for your suggestion, but I would like to keep access to the objects like purchase_order.delivery_notes

I checked that if I set type=Ignore in some field's metadata like this metadata={ "name": "Items", "type": "Ignore", "required": True, } , it allows the parser to skip binding that element as I need.

So the problem is how to customize dataclass generation to set "type": "Ignore" in certain fields

Irbiss555 avatar Dec 01 '23 11:12 Irbiss555

You gave to do it manually, or create a custom schema

tefra avatar Mar 09 '24 18:03 tefra