woodstox
woodstox copied to clipboard
W3C Schema Validation does not cater for xs:unique constraints
Consider the following XSD (called idc2.xsd):
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="idc2.xsd"
xmlns:idc="idc2.xsd"
targetNamespace="idc2.xsd"
elementFormDefault="qualified"
version="1.0"
>
<xsd:element name="itemList">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="item" maxOccurs="unbounded" type="xsd:decimal" />
</xsd:sequence>
</xsd:complexType>
<xsd:unique name="itemAttr">
<xsd:selector xpath="idc:item"/>
<xsd:field xpath="."/>
</xsd:unique>
</xsd:element>
</xsd:schema>
And the corresponding (invalid) XML (called idc2.xml):
<?xml version="1.0"?>
<itemList xmlns="idc2.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="idc2.xsd idc2.xsd">
<item>1</item>
<item>1</item>
<item>2</item>
</itemList>
The following pom.xml dependencies I used:
<dependency>
<groupId>com.fasterxml.woodstox</groupId>
<artifactId>woodstox-core</artifactId>
<version>5.0.3</version>
</dependency>
<dependency>
<groupId>msv</groupId>
<artifactId>msv</artifactId>
<version>20050913</version>
</dependency>
<dependency>
<groupId>relaxngDatatype</groupId>
<artifactId>relaxngDatatype</artifactId>
<version>20020414</version>
</dependency>
<dependency>
<groupId>com.sun.msv.datatype.xsd</groupId>
<artifactId>xsdlib</artifactId>
<version>2013.2</version>
</dependency>
And the following example program (TestXSD.java):
package jdi.test.xsdvalidation;
import java.io.InputStream;
import javax.xml.stream.XMLInputFactory;
import org.codehaus.stax2.XMLInputFactory2;
import org.codehaus.stax2.XMLStreamReader2;
import org.codehaus.stax2.validation.XMLValidationException;
import org.codehaus.stax2.validation.XMLValidationSchema;
import org.codehaus.stax2.validation.XMLValidationSchemaFactory;
public class TestXSD {
public static void main(final String[] args) throws Exception {
final String xmlFileName = "idc2.xml";
final String xsdFileName = "idc2.xsd";
//load Schema
final XMLValidationSchemaFactory xmlValidationSchemaFactory = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
final InputStream schemaInputStream = TestXSD.class.getResourceAsStream(xsdFileName);
final XMLValidationSchema xmlValidationSchema = xmlValidationSchemaFactory.createSchema(schemaInputStream);
//load (invalid) XML file
final InputStream xmlInputStream = TestXSD.class.getResourceAsStream(xmlFileName);
final XMLInputFactory2 xmlInputFactory2 = (XMLInputFactory2)XMLInputFactory.newInstance();
final XMLStreamReader2 xmlStreamReader =(XMLStreamReader2) xmlInputFactory2.createXMLStreamReader(xmlInputStream);
try{
//validate the XML file
xmlStreamReader.validateAgainst(xmlValidationSchema);
//traverse the streaming document
while(xmlStreamReader.hasNext()){
xmlStreamReader.next();
}
} catch(final XMLValidationException e){
//catch validation exception
System.err.println("XML file: " + xmlFileName + " failed to validatate against: " + relaxNgFileName);
return ;
}
System.out.println("XML file: " + xmlFileName + " successfully validated against: " + relaxNgFileName);
}
}
produces the not expected output:
XML file: idc2.xml successfully validated against: idc2.xsd
Note:
- Xerces-J 2.11 and Eclipse validate this file correctly.
Apparently I suspect that the class GenericMsvValidator is not doing any of Unique, Key, KeyRef validations. Only ID and IDREF seems to be supported (not sure whether this coincides with Key / KeyRef respectively). I have not found out whether this is supposed to be the case or just an issue. I furthermore discovered the field mVGM.grammer.topLevel.element.identityConstraints did contain the unique constraint. The obvious suggestion to use Xerces-J instead does not solve my problem either, as https://issues.apache.org/jira/browse/XERCESJ-1276 is doing unique constraint validation in O(n^2) instead of O(n log(n)) as expected.
Thank you for reporting this. I do not know for sure whether this is supportable: since actual validation logic is provided by MSV (multi-schema validator), that would give the ultimate truth. If MSV supports this, it should be supportable by Woodstox.
But I suspect the problem is that use of xpath expression for definition might require access to full tree model (dom), and if so, it would not be supportable with streaming(-only) parser.
For what it is worth, Woodstox is using the latest release of MSV, 2013.6.1:
http://mvnrepository.com/artifact/net.java.dev.msv/msv-core
otherwise only found this:
https://java.net/jira/browse/MSV-24
which could suggest that validation is not supported. On the other hand, limitations spelled out in:
https://github.com/kohsuke/msv/blob/master/msv/doc/commandline.html
do not mention that this is missing, unless I misread it.
So... not sure whether msv provides for it. But even if it does, it is quite possible Woodstox is not feeding all information necessary if so.