woodstox icon indicating copy to clipboard operation
woodstox copied to clipboard

W3C Schema Validation does not cater for xs:unique constraints

Open Jens-Dittrich opened this issue 7 years ago • 2 comments

Consider the following XSD (called idc2.xsd):

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns="idc2.xsd"
            xmlns:idc="idc2.xsd"
            targetNamespace="idc2.xsd"
            elementFormDefault="qualified"
            version="1.0"
            >
  <xsd:element name="itemList">
	<xsd:complexType>
	  <xsd:sequence>
	    <xsd:element name="item" maxOccurs="unbounded" type="xsd:decimal" />
	  </xsd:sequence>
	</xsd:complexType>
	<xsd:unique name="itemAttr">
	  <xsd:selector xpath="idc:item"/>
	  <xsd:field    xpath="."/>
	</xsd:unique>
  </xsd:element>
</xsd:schema>

And the corresponding (invalid) XML (called idc2.xml):

<?xml version="1.0"?>
<itemList xmlns="idc2.xsd"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="idc2.xsd idc2.xsd">
   <item>1</item>
   <item>1</item>
   <item>2</item>
</itemList>

The following pom.xml dependencies I used:

<dependency>
    <groupId>com.fasterxml.woodstox</groupId>
    <artifactId>woodstox-core</artifactId>
    <version>5.0.3</version>
</dependency>
<dependency>
    <groupId>msv</groupId>
    <artifactId>msv</artifactId>
    <version>20050913</version>
</dependency>
<dependency>
    <groupId>relaxngDatatype</groupId>
    <artifactId>relaxngDatatype</artifactId>
    <version>20020414</version>
</dependency>
<dependency>
    <groupId>com.sun.msv.datatype.xsd</groupId>
    <artifactId>xsdlib</artifactId>
    <version>2013.2</version>
</dependency>

And the following example program (TestXSD.java):

package jdi.test.xsdvalidation;

import java.io.InputStream;

import javax.xml.stream.XMLInputFactory;

import org.codehaus.stax2.XMLInputFactory2;
import org.codehaus.stax2.XMLStreamReader2;
import org.codehaus.stax2.validation.XMLValidationException;
import org.codehaus.stax2.validation.XMLValidationSchema;
import org.codehaus.stax2.validation.XMLValidationSchemaFactory;

public class TestXSD {
	
	public static void main(final String[] args) throws Exception {
		final String xmlFileName = "idc2.xml";
		final String xsdFileName = "idc2.xsd";
		
		//load Schema
		final XMLValidationSchemaFactory xmlValidationSchemaFactory = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
		
		final InputStream schemaInputStream = TestXSD.class.getResourceAsStream(xsdFileName);
		final XMLValidationSchema xmlValidationSchema = xmlValidationSchemaFactory.createSchema(schemaInputStream);
		
		//load (invalid) XML file
		final InputStream xmlInputStream = TestXSD.class.getResourceAsStream(xmlFileName);
		final XMLInputFactory2 xmlInputFactory2 = (XMLInputFactory2)XMLInputFactory.newInstance();
		final XMLStreamReader2 xmlStreamReader =(XMLStreamReader2) xmlInputFactory2.createXMLStreamReader(xmlInputStream);
		try{
			//validate the XML file
			xmlStreamReader.validateAgainst(xmlValidationSchema);
			//traverse the streaming document
			while(xmlStreamReader.hasNext()){
				xmlStreamReader.next();
			}
		} catch(final XMLValidationException e){
			//catch validation exception
			System.err.println("XML file: " + xmlFileName + " failed to validatate against: " + relaxNgFileName);
			return ;
		}
		System.out.println("XML file: " + xmlFileName + " successfully validated against: " + relaxNgFileName);
	}

}

produces the not expected output:

XML file: idc2.xml successfully validated against: idc2.xsd

Note:

  • Xerces-J 2.11 and Eclipse validate this file correctly.

Apparently I suspect that the class GenericMsvValidator is not doing any of Unique, Key, KeyRef validations. Only ID and IDREF seems to be supported (not sure whether this coincides with Key / KeyRef respectively). I have not found out whether this is supposed to be the case or just an issue. I furthermore discovered the field mVGM.grammer.topLevel.element.identityConstraints did contain the unique constraint. The obvious suggestion to use Xerces-J instead does not solve my problem either, as https://issues.apache.org/jira/browse/XERCESJ-1276 is doing unique constraint validation in O(n^2) instead of O(n log(n)) as expected.

Jens-Dittrich avatar Dec 30 '16 21:12 Jens-Dittrich

Thank you for reporting this. I do not know for sure whether this is supportable: since actual validation logic is provided by MSV (multi-schema validator), that would give the ultimate truth. If MSV supports this, it should be supportable by Woodstox.

But I suspect the problem is that use of xpath expression for definition might require access to full tree model (dom), and if so, it would not be supportable with streaming(-only) parser.

cowtowncoder avatar Jan 06 '17 22:01 cowtowncoder

For what it is worth, Woodstox is using the latest release of MSV, 2013.6.1:

http://mvnrepository.com/artifact/net.java.dev.msv/msv-core

otherwise only found this:

https://java.net/jira/browse/MSV-24

which could suggest that validation is not supported. On the other hand, limitations spelled out in:

https://github.com/kohsuke/msv/blob/master/msv/doc/commandline.html

do not mention that this is missing, unless I misread it.

So... not sure whether msv provides for it. But even if it does, it is quite possible Woodstox is not feeding all information necessary if so.

cowtowncoder avatar Jan 06 '17 22:01 cowtowncoder