XML.jl
XML.jl copied to clipboard
Read and write XML in pure Julia
trafficstars
XML.jl
Read and write XML in pure Julia.
Introduction
This package offers fast data structures for reading and writing XML files with a consistent interface:
Node/LazyNode Interface:
nodetype(node) → XML.NodeType (an enum type)
tag(node) → String or Nothing
attributes(node) → OrderedDict{String, String} or Nothing
value(node) → String or Nothing
children(node) → Vector{typeof(node)}
is_simple(node) → Bool (whether node is simple .e.g. <tag>item</tag>)
simple_value(node) → e.g. "item" from <tag>item</tag>)
Extended Interface for LazyNode
depth(node) → Int
next(node) → typeof(node)
prev(node) → typeof(node)
parent(node) → typeof(node)
Quickstart
using XML
filename = joinpath(dirname(pathof(XML)), "..", "test", "data", "books.xml")
doc = read(filename, Node)
children(doc)
# 2-Element Vector{Node}:
# Node Declaration <?xml version="1.0"?>
# Node Element <catalog> (12 children)
doc[end] # The root node
# Node Element <catalog> (12 children)
doc[end][2] # Second child of root
# Node Element <book id="bk102"> (6 children)
Data Structures that Represent XML Nodes
Preliminary: NodeType
- Each item in an XML DOM is classified by its
NodeType. - Every
XML.jlstruct defines anodetype(x)method that returns itsNodeType.
| NodeType | XML Representation | Node Constructor |
|---|---|---|
Document |
An entire document | Document(children...) |
DTD |
<!DOCTYPE ...> |
DTD(...) |
Declaration |
<?xml attributes... ?> |
Declaration(; attrs...) |
ProcessingInstruction |
<?tag attributes... ?> |
ProcessingInstruction(tag; attrs...) |
Comment |
<!-- text --> |
Comment(text) |
CData |
<![CData[text]]> |
CData(text) |
Element |
<tag attributes... > children... </NAME> |
Element(tag, children...; attrs...) |
Text |
the text part of <tag>text</tag> |
Text(text) |
Node: Probably What You're Looking For
read-ing aNodeloads the entire XML DOM in memory.- See the table above for convenience constructors.
Nodes have some additional methods that aid in construction/mutation:
# Add a child:
push!(parent::Node, child::Node)
# Replace a child:
parent[2] = child
# Add/change an attribute:
node["key"] = value
node["key"]
Nodeis an immutable type. However, you can easily create a copy with one or more field values changed by using theNode(::Node; kw...)constructor wherekware the fields you want to change. For example:
node = XML.Element("tag", XML.Text("child"))
simple_value(node)
# "child"
node2 = Node(node, children=XML.Text("changed"))
simple_value(node2)
# "changed"
Writing Element Nodes with XML.h
Similar to Cobweb.jl, XML.h enables you to write elements with a simpler syntax:
using XML: h
julia> node = h.parent(
h.child("first child content", id="id1"),
h.child("second child content", id="id2")
)
# Node Element <parent> (2 children)
julia> print(XML.write(node))
# <parent>
# <child id="id1">first child content</child>
# <child id="id2">second child content</child>
# </parent>
XML.LazyNode: For Fast Iteration through an XML File
A lazy data structure that just keeps track of the position in the raw data (Vector{UInt8}) to read from.
- You can iterate over a
LazyNodeto "read" through an XML file:
doc = read(filename, LazyNode)
foreach(println, doc)
# LazyNode Declaration <?xml version="1.0"?>
# LazyNode Element <catalog>
# LazyNode Element <book id="bk101">
# LazyNode Element <author>
# LazyNode Text "Gambardella, Matthew"
# LazyNode Element <title>
# ⋮
Reading
# Reading from file:
read(filename, Node)
read(filename, LazyNode)
# Parsing from string:
parse(Node, str)
parse(LazyNode, str)
Writing
XML.write(filename::String, node) # write to file
XML.write(io::IO, node) # write to stream
XML.write(node) # String
Performance
- XML.jl performs comparatively to EzXML.jl, which wraps the C library libxml2.
- See the
benchmarks/suite.jlfor the code to produce these results. - The following output was generated in a Julia session with the following
versioninfo:
julia> versioninfo()
Julia Version 1.9.4
Commit 8e5136fa297 (2023-11-14 08:46 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 10 × Apple M1 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 8 on 8 virtual cores
Reading an XML File
XML.LazyNode 0.009583
XML.Node ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1071.32
EzXML.readxml ■■■■■■■■■ 284.346
XMLDict.xml_dict ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1231.47
Writing an XML File
Write: XML ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 289.638
Write: EzXML ■■■■■■■■■■■■■ 93.4631
Lazily Iterating over Each Node
LazyNode ■■■■■■■■■ 51.752
EzXML.StreamReader ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 226.271
Collecting All Names/Tags in an XML File
XML.LazyNode ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 210.482
EzXML.StreamReader ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 276.238
EzXML.readxml ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 263.269
Possible Gotchas
- XML.jl doesn't automatically escape special characters (
<,>,&,", and') for you. However, we provide utility functions for doing the conversions back and forth:XML.escape(::String)andXML.unescape(::String)XML.escape!(::Node)andXML.unescape!(::Node).