xml5_draft icon indicating copy to clipboard operation
xml5_draft copied to clipboard

Markup declarations in DOCTYPEs are parsed backwards‑incompatibly

Open ExE-Boss opened this issue 6 years ago • 5 comments

Right now, XML1.0 parses:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE greeting [
  <!ELEMENT greeting (#PCDATA)>
  <!ELEMENT other (#PCDATA)>
]>
<greeting>Hello, world!</greeting>

as

├ xml: version="1.0" encoding="UTF-8"
├ DOCTYPE: greeting
│ │ // Note that these are ignored by non‑validating parsers, eg. browsers:
│ ├ ELEMENT: greeting (#PCDATA)
│ └ ELEMENT: other (#PCDATA)
└ greeting
  └ #text: Hello, world!

whereas XML5 parses it as:

├ xml: version="1.0" encoding="UTF-8"
├ DOCTYPE: greeting
├ #comment: ELEMENT other (#PCDATA)
├ #text: ]>
└ greeting
  └ #text: Hello, world!

Since the XML5 parser seems to be intended to parse current XML while ignoring DTDs, this seems like it should parse as:

├ xml: version="1.0" encoding="UTF-8"
├ DOCTYPE: greeting
└ greeting
  └ #text: Hello, world!

(The <!ELEMENT greeting (#PCDATA)> and <!ELEMENT other (#PCDATA)> entries are ignored)

ExE-Boss avatar Apr 02 '19 11:04 ExE-Boss

Hm, that is troubling.

I'll see to update the xml5draft and xml5ever, probably over the weekend.

@ExE-Boss As a small question do you need DOCTYPE processing? And if yes, do you need Entity references?

Ygg01 avatar Apr 02 '19 14:04 Ygg01

I think entity references should be supported, given that they are extensively used within the Firefox browser’s XUL based UI.

ExE-Boss avatar Apr 02 '19 19:04 ExE-Boss

Is XUL still a thing? I thought it was replaced by browser.html

Ygg01 avatar Apr 02 '19 19:04 Ygg01

Yes, XUL is still a thing and will be for at least a few more years.

On the plus side, XBL is almost dead.

ExE-Boss avatar Apr 02 '19 19:04 ExE-Boss

We could start off with ignoring entities, and just making sure that an opening [ in a DOCTYPE must be matched by a closing ]>.

ExE-Boss avatar Apr 22 '19 00:04 ExE-Boss