SLAXML
SLAXML copied to clipboard
Missing DOCTYPE support
Hi there!
I've faced an issue that slaxdom failed to build a DOM for document that have a <!DOCTYPE>
.
w3 says it is valid: https://www.w3.org/TR/xml/#NT-doctypedecl
Minimal test-case:
sd=require"slaxdom"
z=sd:dom("<!DOCTYPE><a></a>")
The error would be:
/usr/share/lua/5.1/slaxdom.lua:34: Document has non-whitespace text at root: '<!DOCTYPE>'
Stack traceback:
At =[C]:-1 (in global error)
At @/usr/share/lua/5.1/slaxdom.lua:34 (in field text)
0031: end,
0032: text = function(value,cdata)
0033: -- documents may only have text node children that are whitespace: https://www.w3.org/TR/xml/#NT-Misc
0034: if current.type=='document' and not value:find('^%s+$') then error(("Document has non-whitespace text at root: '%s'"):format(value:gsub('[\r\n\t]',{['\r']='\\r', ['\n']='\\n', ['\t']='\\t'}))) end
0035: push(current.kids,{type='text',name='#text',cdata=cdata and true or nil,value=value,parent=rich and current or nil})
0036: end,
0037: comment = function(value)
At @/usr/share/lua/5.1/slaxml.lua:87 (in upvalue finishText)
0084: text = gsub(text,'%s+$','')
0085: if #text==0 then text=nil end
0086: end
0087: if text then self._call.text(unescape(text),false) end
0088: end
0089: end
0090:
At @/usr/share/lua/5.1/slaxml.lua:125 (in local startElement)
0122: if first then
0123: currentElement[2] = nil -- reset the nsURI, since this table is re-used
0124: currentElement[3] = nil -- reset the nsPrefix, since this table is re-used
0125: finishText()
0126: pos = last+1
0127: first,last,match2 = find(xml, '^:([%a_][%w_.-]*)', pos )
0128: if first then
At @/usr/share/lua/5.1/slaxml.lua:239 (in method parse)
0236: while pos<#xml do
0237: if state=="text" then
0238: if not (findPI() or findComment() or findCDATA() or findElementClose()) then
0239: if startElement() then
0240: state = "attributes"
0241: else
0242: first, last = find( xml, '^[^<]+', pos )
At @/usr/share/lua/5.1/slaxdom.lua:44 (in method dom)
0041: push(current.kids,{type='pi',name=name,value=value,parent=rich and current or nil})
0042: end
0043: }
0044: builder:parse(xml,opts)
0045: return doc
0046: end
0047:
At stdin#22:1 (in ?)
0001: z=sd:dom("<!DOCTYPE><a></a>")
For now, I working that around by converting doctype to comment, and "uncommenting" it again right before serialization, but it would be nice if it will work out-of-the box :)