jaxb-ri
jaxb-ri copied to clipboard
Slow performance of Unmarshaller when reading xsd:any element
I had already reported this issue to Oracle, but I don't think it got attention (and maybe I was reporting it to the wrong people anyway). On the [email protected] mailing list Iaroslav guided me here. So below is a copy/paste of my bug report. In short: unmarshalling xml files with a lot of 'any' elements takes an unacceptable long time in JDK 1.7u4.
Date Created: Thu Mar 28 08:28:09 MST 2013 Type: bug Customer Name: Pieter Buzing Customer Email: Buzing@... SDN ID: status: Waiting Category: jaxb-xsd Subcategory: runtime Company: Riscure BV release: 1.0.4 hardware: x64 OSversion: windows_7 priority: 3 Synopsis: Performance degradation between java 7u3 and 7u4 for xml parsing of "xsd:any" Description: FULL PRODUCT VERSION : java version "1.7.0_04" Java(TM) SE Runtime Environment (build 1.7.0_04-b22) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
ADDITIONAL OS VERSION INFORMATION : Microsoft Windows [Version 6.1.7601]
A DESCRIPTION OF THE PROBLEM : We observe a performance degradation between java 7u3 and 7u4 with regard to the xml parsing of an "xsd:any" element with JAXB2. Also later java updates have the same poor performance.
Given:
- an xsd schema which contains an "xsd:any" element and which is compiled with xjc (version 2.2.4)
- an xml file that conforms to the above xsd schema and contains (a lot of) xsd:any elements
- a java class that unmarshals the above xml file
- a Windows 7 computer with both jdk 7u3 and a later update installed
When we run the java unmarshal code with jdk1.7.0_04\bin\java.exe (7u4) we observe that the required time is at least twice the runtime needed by jdk1.7.0_03\bin\java.exe (7u3).
REGRESSION. Last worked in version 1.0.4
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM : 1. Prepare an xsd file with an xs:any element (like in the comments in the java code), compiling it with xjc. 2. Prepare a matching xml file that contains (a lot of) xs:any elements (see also the comments in the java file). 3. Compile and run java code that unmarshals the xml data (like the supplied example code). 4. Observe that running the code with jdk7u3 is at least twice as fast compared to later updates.
EXPECTED VERSUS ACTUAL BEHAVIOR : EXPECTED - The expected behavior would be an equal runtime for both java 7u3 and 7u4. ACTUAL - We observed a performance difference between different java runtime environments:
java version = 1.7.0_03 .file size = 1613 bytes, read took 36 ms
java version = 1.7.0_04 .file size = 1613 bytes, read took 75 ms
When we increase the size of the xml file (or read multiple xml files) the relative performance difference grows. Also for later updates like 7u17 we see the same performance degradation compared to 7u3.
REPRODUCIBILITY : This bug can be reproduced always.
---------- BEGIN SOURCE ---------- import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.InputStream; import java.util.ArrayList; import java.util.List; import javax.xml.bind.JAXBContext; import javax.xml.bind.JAXBException; import javax.xml.bind.Unmarshaller; import anyexample.Root; import anyexample.Root.Anyproperties;
/** This code demonstrates a performance bug in java 7u4. Steps to replicate this bug:
1. Save and compile this xsd schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";> <xs:element name='Root'> xs:complexType xs:sequence <xs:element minOccurs="1" maxOccurs="unbounded" name="anyproperties"> xs:complexType xs:sequence <xs:any minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
The code assumes that the resulting class files are stored in a directory called anyexample.
2. Store the following xml file as 'anyprops.xml':
3. Run the code below with both jdk7u3 and jdk7u4 (or later) and observe a significant performance difference. */ public class AnyPropertiesTest {
private JAXBContext anyContext; private Unmarshaller anyUnmarshaller;
public static final String PATH = "anyprops.xml";
public AnyPropertiesTest() { System.out.println("java version = " + System.getProperty("java.version")); try { if (anyContext == null)
{ anyContext = JAXBContext.newInstance(Root.class); anyUnmarshaller = anyContext.createUnmarshaller(); }
} catch (JAXBException e)
{ e.printStackTrace(); }
}
public void testAny()
{ long startTime = System.nanoTime(); File inputFile = new File(PATH); List<Anyproperties> propertiesList = getProperties(inputFile); long duration = (System.nanoTime() - startTime) / 1_000_000; String s = String.format("read %d properties in %d ms", propertiesList.size(), duration); System.out.println(s); }
private List<Anyproperties> getProperties(File path) { List<Anyproperties> properties;
try
{ System.out.printf("file size = %d bytes, ", path.length()); properties = read(new FileInputStream(path)); }
catch (JAXBException e)
{ System.out.println(String.format( "File '%s' could not be read by JAXB: might not be a valid XML file", path.getAbsolutePath())); properties = null; }
catch (FileNotFoundException e)
{ // Impossible as path was obtained from listFiles String errorMessage = String.format("File '%s' obtained by File.listFiles() does not exist", path.getAbsolutePath()); throw new RuntimeException(errorMessage, e); }
return properties; }
private synchronized List<Anyproperties> read(InputStream stream) throws JAXBException
{ List<Anyproperties> templates = null; long startTime = System.nanoTime(); //It is the unmarshall method that takes a considerable amount of time on java 7u4, compared to 7u3 Root root = (Root) anyUnmarshaller.unmarshal(stream); long duration = (System.nanoTime() - startTime)/1_000_000; System.out.printf("read took %d ms\n", duration); templates = (List<Anyproperties>) root.getAnyproperties(); return templates; }
public static void main(String[] argv)
{ AnyPropertiesTest test = new AnyPropertiesTest(); test.testAny(); }
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND : In order to deal with this issue we ship our product with java 7u3. Upgrading to update 4 or later has a considerable performance impact (we use large xml files). The obvious solution is to avoid xsd:any elements, but this is not always feasible.
Environment
Windows 7, standard JDK 1.7.0_04
- Issue Imported From: https://github.com/javaee/jaxb-v2/issues/996
- Original Issue Raised By:@glassfishrobot
- Original Issue Assigned To: @glassfishrobot
@glassfishrobot Commented Reported by pcb
@glassfishrobot Commented ilyaz said: This issue has not yet been addressed but is still very important to us (I'm from the same company as the reporter of this issue). It is now blocking our migration from Java7 to Java8, because in a critical feature of our product some of the options shown to our users in the user interface are defined in XML. JDK 7u4+ makes the reading (and therefore the perceived rendering) too slow for our users.
Furthermore, there are some bug fixes in JDK 7u4+ (such as the file explorer crashing in Windows when users sort an open file dialog by date) that we cannot provide to our users because of this issue.
Is any work on this issue expected?
@glassfishrobot Commented elena-lorena said: Could you please update us if there is any planning to fix this issue in the coming release(s)? We have started to migrate our projects to Java 8 and this issue is still blocking for us... (I'm from the same company as pcb and Ilyaz) Your feedback would be much appreciated. Thank you!
@glassfishrobot Commented Was assigned to yaroska
@glassfishrobot Commented This issue was imported from java.net JIRA JAXB-996
This issue is still active. We have 30 any types and a 180 MB jar classpath (metaspace about 1700 mb) and are seeing 20x or more performance degeneration in latest and greatest Spring Boot + OpenJDK 11.
- Workaround 1, if you do reading only or writing only, not touching unknown elements:
- Detail any-types to known types in XML schema, then uncomment any-types. This is generally more tidy too.
- Workaround 2, if interesting nodes do not contain any-types:
- Deserialize to DOM documents, search for interesting nodes, deserialize those nodes.
- Workaround 3, for small documents:
- Work with DOM documents