SLAXML icon indicating copy to clipboard operation
SLAXML copied to clipboard

Missing DOCTYPE support

Open msva opened this issue 5 years ago • 0 comments

Hi there! I've faced an issue that slaxdom failed to build a DOM for document that have a <!DOCTYPE>. w3 says it is valid: https://www.w3.org/TR/xml/#NT-doctypedecl

Minimal test-case:

sd=require"slaxdom"
z=sd:dom("<!DOCTYPE><a></a>")

The error would be:

/usr/share/lua/5.1/slaxdom.lua:34: Document has non-whitespace text at root: '<!DOCTYPE>'
Stack traceback:
  At =[C]:-1 (in global error)
  At @/usr/share/lua/5.1/slaxdom.lua:34 (in field text)
    0031:               end,
    0032:               text = function(value,cdata)
    0033:                       -- documents may only have text node children that are whitespace: https://www.w3.org/TR/xml/#NT-Misc
    0034:                       if current.type=='document' and not value:find('^%s+$') then error(("Document has non-whitespace text at root: '%s'"):format(value:gsub('[\r\n\t]',{['\r']='\\r', ['\n']='\\n', ['\t']='\\t'}))) end
    0035:                       push(current.kids,{type='text',name='#text',cdata=cdata and true or nil,value=value,parent=rich and current or nil})
    0036:               end,
    0037:               comment = function(value)
  At @/usr/share/lua/5.1/slaxml.lua:87 (in upvalue finishText)
    0084:                               text = gsub(text,'%s+$','')
    0085:                               if #text==0 then text=nil end
    0086:                       end
    0087:                       if text then self._call.text(unescape(text),false) end
    0088:               end
    0089:       end
    0090:
  At @/usr/share/lua/5.1/slaxml.lua:125 (in local startElement)
    0122:               if first then
    0123:                       currentElement[2] = nil -- reset the nsURI, since this table is re-used
    0124:                       currentElement[3] = nil -- reset the nsPrefix, since this table is re-used
    0125:                       finishText()
    0126:                       pos = last+1
    0127:                       first,last,match2 = find(xml, '^:([%a_][%w_.-]*)', pos )
    0128:                       if first then
  At @/usr/share/lua/5.1/slaxml.lua:239 (in method parse)
    0236:       while pos<#xml do
    0237:               if state=="text" then
    0238:                       if not (findPI() or findComment() or findCDATA() or findElementClose()) then
    0239:                               if startElement() then
    0240:                                       state = "attributes"
    0241:                               else
    0242:                                       first, last = find( xml, '^[^<]+', pos )
  At @/usr/share/lua/5.1/slaxdom.lua:44 (in method dom)
    0041:                       push(current.kids,{type='pi',name=name,value=value,parent=rich and current or nil})
    0042:               end
    0043:       }
    0044:       builder:parse(xml,opts)
    0045:       return doc
    0046: end
    0047:
  At stdin#22:1 (in  ?)
    0001: z=sd:dom("<!DOCTYPE><a></a>")

For now, I working that around by converting doctype to comment, and "uncommenting" it again right before serialization, but it would be nice if it will work out-of-the box :)

msva avatar Dec 29 '19 14:12 msva