deno-xml-parser icon indicating copy to clipboard operation
deno-xml-parser copied to clipboard

Can't handle DTDs

Open sigs opened this issue 4 years ago • 1 comments

Repro

deno run --allow-all main.ts

where main.ts:

import parse from "https://denopkg.com/nekobato/deno-xml-parser/index.ts"
import * as log from "https://deno.land/std/log/mod.ts";
const infile = "./test.xml"
const input = await Deno.readTextFile(infile)
const test = parse(input.replaceAll("\n", ""))
console.log(test)

where test.xml:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE kotus-sanalista SYSTEM "kotus-sanalista.dtd">
<kotus-sanalista>
<st><s>aakkonen</s><t><tn>38</tn></t></st>
<st><s>aakkosellinen</s><t><tn>38</tn></t></st>
<st><s>aakkosellisesti</s><t><tn>99</tn></t></st>
</kotus-sanalista>

where kotus-sanalista.dtd:

<!ELEMENT  kotus-sanalista  (st*) >

<!ELEMENT  st  (s, hn?, t*) >
                          
<!ELEMENT  s  (#PCDATA) >
            
<!ELEMENT  hn  (#PCDATA) >
             
<!ELEMENT  t  (tn, av?)* >
<!ATTLIST  t  taivutus CDATA #IMPLIED>

<!ELEMENT  tn  (#PCDATA) >

<!ELEMENT  av  (#PCDATA) >
<!ATTLIST  av  astevaihtelu CDATA #IMPLIED>

Expected

{
  declaration: { attributes: { version: "1.0", encoding: "utf-8" } },
  root: {
    name: "kotus-sanalista",
    attributes: {},
    children: [
      { name: "st", attributes: [Object], children: [Array], content: "" },
      { name: "st", attributes: [Object], children: [Array], content: "" },
      { name: "st", attributes: [Object], children: [Array], content: "" }
    ],
    content: ""
  }
}

Actual

{
  declaration: { attributes: { version: "1.0", encoding: "utf-8" } },
  root: undefined
}

Notes

The "expected" is the output after removing the offending DTD line from xml file. I'm not sure if it really is what I expect, but at the very least I'd expect it to ignore the DOCTYPE tag and give me the contents. Even better if it would actually parse them somehow according to DTD...

sigs avatar Oct 10 '20 20:10 sigs

If i looked right, process is to read first line and if match found remove the match from the string.
In your case the the first match was the declaration and got removed. Now the doctype line gets no match. So it returns an undefined.

Seems to be similar case as in #8

danielwentland avatar Dec 15 '20 14:12 danielwentland