dom4j
dom4j copied to clipboard
QName validation from 2.1.1 fails for namespaced attributes
The QName validation added for issue #48 seems to open a regression if an attribute has a namespace qualifier. This XML parsing fails:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sites xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<site>
<id>1</id>
<name>Default</name>
<url_namespace></url_namespace>
<user_quota xsi:nil="true"/>
<content_admin_mode>2</content_admin_mode>
<storage_quota xsi:nil="true"/>
<sheet_image_enabled>true</sheet_image_enabled>
<extract_encryption_mode>disabled</extract_encryption_mode>
<materialized_views_mode>enable_selective</materialized_views_mode>
<use_default_time_zone>true</use_default_time_zone>
</site>
<site>
<id>4</id>
<name>testsite_4432</name>
<url_namespace>testsite_4432_url</url_namespace>
<user_quota xsi:nil="true"/>
<content_admin_mode>2</content_admin_mode>
<storage_quota xsi:nil="true"/>
<sheet_image_enabled>true</sheet_image_enabled>
<extract_encryption_mode>disabled</extract_encryption_mode>
<materialized_views_mode>enable_selective</materialized_views_mode>
<use_default_time_zone>true</use_default_time_zone>
</site>
</sites>
with this exception:
Caused by: java.lang.IllegalArgumentException: Illegal character in local name: 'xsi:nil'.
at org.dom4j.QName.validateNCName(QName.java:346)
at org.dom4j.QName.<init>(QName.java:153)
at org.dom4j.tree.QNameCache.createQName(QNameCache.java:245)
at org.dom4j.tree.QNameCache.get(QNameCache.java:115)
at org.dom4j.DocumentFactory.createQName(DocumentFactory.java:191)
at org.dom4j.tree.NamespaceStack.createQName(NamespaceStack.java:392)
at org.dom4j.tree.NamespaceStack.pushQName(NamespaceStack.java:374)
at org.dom4j.tree.NamespaceStack.getAttributeQName(NamespaceStack.java:257)
at org.dom4j.tree.AbstractElement.setAttributes(AbstractElement.java:454)
at org.dom4j.io.SAXContentHandler.addAttributes(SAXContentHandler.java:899)
at org.dom4j.io.SAXContentHandler.startElement(SAXContentHandler.java:241)
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:510)
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:183)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1377)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2710)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at org.dom4j.io.SAXReader.read(SAXReader.java:494)
... 17 more
The failure did not occur with 2.1.0.
(sorry for the multiple edits ... for some reason I am unable to get version numbers correct on the first couple of tries :-) )
I don't think it matters, but we set these feature options:
saxParserFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
saxParserFactory.setFeature("http://xml.org/sax/features/external-general-entities", false);
saxParserFactory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
I believe we have fixed this with this additional parser option saxParserFactory.setNamespaceAware(true);
The question is, should dom4j not change anything, make a change to create a more meaningful error or implement a feature to allow creating doms without proper namespace info if the underlying reader is not namespace aware?