XML.jl
XML.jl copied to clipboard
parse dtd/entity
Not sure if this is within the scope of this package, but currently it seems the DTD may not be correctly parsed, such as entity tags. For example, with this file as test.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note [
<!ENTITY nbsp " ">
<!ENTITY writer "Writer: Donald Duck.">
<!ENTITY copyright "Copyright: W3Schools.">
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
<footer>&writer; ©right;</footer>
</note>
using EzXML.jl or in browser, the footer part is parsed as "Writer: Donald Duck. Copyright: W3Schools."
using EzXML
doc = readxml("test.xml")
doc.root |> eachelement |> collect |> last |> nodecontent |> println
doc.node.owner = TextNode("") # skip gc
but with XML.jl, they are verbatim strings &writer; ©right;
using XML
doc2 = read("test.xml", Node)
doc2[end][end][1] |> x -> x.value |> println
in addition, glancing over doc2 it appears the DTD part may not be correctly parsed, e.g. doc2[2] is
Node DTD <!DOCTYPE note [
<!ENTITY nbsp " ">
i.e. it matches the next ">" instead of the closing ">" for "<!DOCTYPE"
https://github.com/JuliaComputing/XML.jl/blob/53d7ed347cc115fc8c1dfe34814c577360fb997f/src/raw.jl#L262
Thanks!
Thanks for the report. Parsing DTD is within scope of this package. For now, I was trying to dump everything into the Node's value and figure out parsing later. As you pointed out, that doesn't quite work because it matches the wrong ending tag. I'll work on a fix.
Quick fix is done for reading the DTD:
julia> parse(s, Node)[2]
# Node DTD <!DOCTYPE note [
# <!ENTITY nbsp " ">
# <!ENTITY writer "Writer: Donald Duck.">
# <!ENTITY copyright "Copyright: W3Schools.">
# ]>
using EzXML.jl or in browser, the footer part is parsed as "Writer: Donald Duck. Copyright: W3Schools."
I'd argue that the Text Node's value ought to be "&writer; ©right;" to keep the separation of concerns (https://en.wikipedia.org/wiki/Separation_of_content_and_presentation).
That being said I see a use for a fill_entities!(::Node) function.