woodstox icon indicating copy to clipboard operation
woodstox copied to clipboard

GenericMsvValidator.getAttributeType(int) always returns null, causing a NullPointerException in com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM

Open ndru83 opened this issue 6 years ago • 11 comments

When reading a DOM document form a StAXSource backed by a validating XMLStreamReader2, com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM will throw a NullPointerException when trying to process attributes. This seems to be caused by GenericMsvValidator.getAttributeType(int) always returning a null reference for attribute type, which SAX2DOM is unprepared to handle.

The exception stack trace:

java.lang.NullPointerException
	at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.startElement(SAX2DOM.java:204)
	at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.closeStartTag(ToXMLSAXHandler.java:208)
	at com.sun.org.apache.xml.internal.serializer.ToSAXHandler.flushPending(ToSAXHandler.java:281)
	at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.startElement(ToXMLSAXHandler.java:650)
	at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleStartElement(StAXStream2SAX.java:319)
	at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:145)
	at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
	at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:688)
	at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:737)
	at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:351)

Tested with:

JRE 1.8.0_141-b15 (x64) com.fasterxml.woodstox:woodstox-core:5.0.3 net.java.dev.msv:msv-core:2013.6.1

Code to reproduce the error:

File xmlFile = new File("Test.xml");
File schemaFile = new File("Test.xsd");

validatAgainst(new File(xmlFile.toURI()), new File(schemaFile.toURI()));

XMLInputFactory2 xmlInputFactory = (XMLInputFactory2) XMLInputFactory2.newFactory();
xmlInputFactory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, true);
xmlInputFactory.setProperty(XMLInputFactory.IS_VALIDATING, true);

XMLValidationSchema xmlValidationSchema = XMLValidationSchemaFactory
		.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA).createSchema(schemaFile);

XMLStreamReader2 xmlStreamReader = (XMLStreamReader2) xmlInputFactory.createXMLStreamReader(xmlFile);
xmlStreamReader.validateAgainst(xmlValidationSchema);

Transformer transformer = TransformerFactory.newInstance().newTransformer();

while (xmlStreamReader.hasNext()) {
	xmlStreamReader.next();
	if (xmlStreamReader.getEventType() == XMLStreamConstants.START_ELEMENT) {
		transformer.reset();
		DOMResult result = new DOMResult();
		transformer.transform(new StAXSource(xmlStreamReader), result);
	}
}

Test.xsd:

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/Test"
	xmlns:tns="http://www.example.org/Test" elementFormDefault="qualified">
	<element name="test">
		<complexType>
			<attribute name="attr" type="string" />
		</complexType>		
	</element>
</schema>

Test.xml

<?xml version="1.0" encoding="UTF-8"?>
<t:test xmlns:t="http://www.example.org/Test" attr="value" />

edit: Fixed broken link.

ndru83 avatar Jul 30 '17 20:07 ndru83

Ok that is possible, but as per XMLValidator interfaces Javadoc:

    /**
     * Method for getting schema-specified type of an attribute, if
     * information is available. If not, validators can return
     * null to explicitly indicate no information was available.
     */
    public abstract String getAttributeType(int index);

so null is a valid value to return and it would seem that caller needs to handle it properly. So I am not sure what Woodstox could do here.

cowtowncoder avatar Mar 28 '18 05:03 cowtowncoder

SAX seems to be operating under the assumption that "CDATA" will be reported as type for attributes where the parser provides none.

source

ndru83 avatar Mar 28 '18 13:03 ndru83

@ndru83 I can accept that wrt Sax parser implementation. But sample code above specifically refers to XMLValidator for which null is specified as value to return.

So: I would be happy to change return value for Woodstox SAX reader implementation, but reproduction as-is unfortunately does not show that code path.

cowtowncoder avatar Mar 28 '18 14:03 cowtowncoder

I'm closing this issue as XMLStreamReader.getAttributeType(int) also doesn't seem to explicitly prohibit implementations from returning a null value for unknown attribute types.

I've filed a bug report on the JDK side instead to to address the missing null check there. (JDK-8202426)

ndru83 avatar May 03 '18 19:05 ndru83

Thank you for filing the JDK bug!

cowtowncoder avatar May 03 '18 19:05 cowtowncoder

Hi @ndru83. I have no way to post anything on bugs.openjdk.java.net. I see that fix is included only in java 11. And I tested it on new java 11 build and it works.

Are you able to ask if fix could be backported to older java versions?

mkozioro avatar Aug 02 '18 15:08 mkozioro

Hi @cowtowncoder

How bad would it be to just return "CDATA".intern() in getAttributeType(int index) of com.ctc.wstx.msv.GenericMsvValidator?

I'm also having problems because of that bug in Java. I see it was already fixed in https://bugs.openjdk.java.net/browse/JDK-8202426, but there is no backport java <11. It was fixed by treating nulls as CDATA.

mkozioro avatar Aug 02 '18 16:08 mkozioro

@mkozioro I filed a request for enhancement to have fixes JDK-8202426 and JDK-8201138 backported to Java 8. I wouldn't hold my breath though: Public updates for Java 8 will end some time this September with the release of Java 11 LTS, so wouldn't be surprised if they decided against it. :(

ndru83 avatar Aug 03 '18 10:08 ndru83

@ndru83 Thanks all your help. We will see. Maybe we will be lucky :)

mkozioro avatar Aug 03 '18 10:08 mkozioro

@ndru83 In theory, Java 8 will be still updated. https://blogs.oracle.com/java-platform-group/extension-of-oracle-java-se-8-public-updates-and-java-web-start-support

mkozioro avatar Aug 10 '18 09:08 mkozioro

@mkozioro I would not be against this as long as it would be a new property to set, so as not to change existing behavior. Filing a PR would be great as I am swamped with other work right now, but hoping to get new Woodstox release out relatively soon, for other fixes.

cowtowncoder avatar Sep 07 '18 03:09 cowtowncoder