Is it possible to parse only necessary XML nodes?
I have recently started using XSData and I faced a problem.
I have to parse a huge XML data (almost 400 generated dataclasses), but I need to get only some of them. For example, let's say I have XML
<?xml version="1.0"?>
<PurchaseOrder PurchaseOrderNumber="99503">
<Address Type="Shipping">
<Name>Ellen Adams</Name>
<Street>123 Maple Street</Street>
<City>Mill Valley</City>
<State>CA</State>
<Zip>10999</Zip>
<Country>USA</Country>
</Address>
<DeliveryNotes>Please leave packages in shed by driveway.</DeliveryNotes>
<Items>
<Item PartNumber="872-AA">
<ProductName>Lawnmower</ProductName>
<Quantity>1</Quantity>
<USPrice>148.95</USPrice>
<Comment>Confirm this is electric</Comment>
</Item>
<Item PartNumber="926-AA">
<ProductName>Baby Monitor</ProductName>
<Quantity>2</Quantity>
<USPrice>39.98</USPrice>
<ShipDate>1999-05-21</ShipDate>
</Item>
</Items>
</PurchaseOrder>
And generated dataclasses
@dataclass
class Address:
type_value: Optional[str] = field( default=None, metadata={ "name": "Type", "type": "Attribute", "required": True, } )
name: Optional[str] = field( default=None, metadata={ "name": "Name", "type": "Element", "required": True, } )
street: Optional[str] = field( default=None, metadata={ "name": "Street", "type": "Element", "required": True, } )
city: Optional[str] = field( default=None, metadata={ "name": "City", "type": "Element", "required": True, } )
state: Optional[str] = field( default=None, metadata={ "name": "State", "type": "Element", "required": True, } )
zip: Optional[int] = field( default=None, metadata={ "name": "Zip", "type": "Element", "required": True, } )
country: Optional[str] = field( default=None, metadata={ "name": "Country", "type": "Element", "required": True, } )
@dataclass
class Item:
part_number: Optional[str] = field( default=None, metadata={ "name": "PartNumber", "type": "Attribute", "required": True, } )
product_name: Optional[str] = field( default=None, metadata={ "name": "ProductName", "type": "Element", "required": True, } )
quantity: Optional[int] = field( default=None, metadata={ "name": "Quantity", "type": "Element", "required": True, } )
usprice: Optional[float] = field( default=None, metadata={ "name": "USPrice", "type": "Element", "required": True, } )
ship_date: Optional[XmlDate] = field( default=None, metadata={ "name": "ShipDate", "type": "Element", } )
comment: Optional[str] = field( default=None, metadata={ "name": "Comment", "type": "Element", } )
@dataclass
class Items:
item: List[Item] = field( default_factory=list, metadata={ "name": "Item", "type": "Element", "min_occurs": 1, } )
@dataclass
class PurchaseOrder:
purchase_order_number: Optional[int] = field( default=None, metadata={ "name": "PurchaseOrderNumber", "type": "Attribute", "required": True, } )
address: Optional[Address] = field( default=None, metadata={ "name": "Address", "type": "Element", "required": True, } )
delivery_notes: Optional[str] = field( default=None, metadata={ "name": "DeliveryNotes", "type": "Element", "required": True, } )
items: Optional[Items] = field( default=None, metadata={ "name": "Items", "type": "Element", "required": True, } )
If I parse like this
from generated.purchase_order import PurchaseOrder
from xsdata.formats.dataclass.parsers import XmlParser
parser = XmlParser()
purchase_order = parser.parse('xml_file', clazz=PurchaseOrder)
It will parse all xml nodes.
How to make XmlParser to parse only the node <DeliveryNotes>Please leave packages in shed by driveway.</DeliveryNotes>?
This is important to me because I have to parse thousands of big XML files
It's not impossible, I don't have an example, but you would have to roll out your own xml handler to ignore all other elements
Or use etree manually, to pick the elements you want and then pass them to xsdata XmlParser
Something like this
tree = lxml.etree.parse("/tmp/Flix_Line_x400.xml")
for element in tree.iterfind(".//{http://www.netex.org.uk/netex}ServiceJourney"):
service_journey = parser.parse(element, ServiceJourney)
@tefra thank you for your suggestion, but I would like to keep access to the objects like purchase_order.delivery_notes
I checked that if I set type=Ignore in some field's metadata like this metadata={ "name": "Items", "type": "Ignore", "required": True, } , it allows the parser to skip binding that element as I need.
So the problem is how to customize dataclass generation to set "type": "Ignore" in certain fields
You gave to do it manually, or create a custom schema