deno-xml-parser
deno-xml-parser copied to clipboard
Can't handle DTDs
Repro
deno run --allow-all main.ts
where main.ts:
import parse from "https://denopkg.com/nekobato/deno-xml-parser/index.ts"
import * as log from "https://deno.land/std/log/mod.ts";
const infile = "./test.xml"
const input = await Deno.readTextFile(infile)
const test = parse(input.replaceAll("\n", ""))
console.log(test)
where test.xml:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE kotus-sanalista SYSTEM "kotus-sanalista.dtd">
<kotus-sanalista>
<st><s>aakkonen</s><t><tn>38</tn></t></st>
<st><s>aakkosellinen</s><t><tn>38</tn></t></st>
<st><s>aakkosellisesti</s><t><tn>99</tn></t></st>
</kotus-sanalista>
where kotus-sanalista.dtd:
<!ELEMENT kotus-sanalista (st*) >
<!ELEMENT st (s, hn?, t*) >
<!ELEMENT s (#PCDATA) >
<!ELEMENT hn (#PCDATA) >
<!ELEMENT t (tn, av?)* >
<!ATTLIST t taivutus CDATA #IMPLIED>
<!ELEMENT tn (#PCDATA) >
<!ELEMENT av (#PCDATA) >
<!ATTLIST av astevaihtelu CDATA #IMPLIED>
Expected
{
declaration: { attributes: { version: "1.0", encoding: "utf-8" } },
root: {
name: "kotus-sanalista",
attributes: {},
children: [
{ name: "st", attributes: [Object], children: [Array], content: "" },
{ name: "st", attributes: [Object], children: [Array], content: "" },
{ name: "st", attributes: [Object], children: [Array], content: "" }
],
content: ""
}
}
Actual
{
declaration: { attributes: { version: "1.0", encoding: "utf-8" } },
root: undefined
}
Notes
The "expected" is the output after removing the offending DTD line from xml file. I'm not sure if it really is what I expect, but at the very least I'd expect it to ignore the DOCTYPE tag and give me the contents. Even better if it would actually parse them somehow according to DTD...
If i looked right, process is to read first line and if match found remove the match from the string.
In your case the the first match was the declaration and got removed. Now the doctype line gets no match. So it returns an undefined.
Seems to be similar case as in #8