jsonschema2pojo icon indicating copy to clipboard operation
jsonschema2pojo copied to clipboard

java.lang.OutOfMemoryError: Java heap space

Open isaacwu666 opened this issue 2 years ago • 2 comments

file: https://developer.walmart.com/image/asdp/us/mp/fulfillment/WFS_Convert_Schema_v4.5.json

cmd:java -jar -Xms8m -Xmx8G -XX:PermSize=8M -XX:MaxPermSize=8G "%~dp0/../lib/jsonschema2pojo-cli-1.1.2.jar" %*

Enum name ColorCategory already used; trying to replace it with ColorCategory________ Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at com.fasterxml.jackson.databind.node.ObjectNode.(ObjectNode.java:30) at com.fasterxml.jackson.databind.node.JsonNodeFactory.objectNode(JsonNodeFactory.java:338) at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:267) at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:277) at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:277) at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:277) at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:277) at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:277) at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:277) at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:277) at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:277) at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:277) at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:69) at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:16) at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322) at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4635) at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3091) at org.jsonschema2pojo.ContentResolver.resolve(ContentResolver.java:76) at org.jsonschema2pojo.SchemaStore.create(SchemaStore.java:61) at org.jsonschema2pojo.SchemaStore.create(SchemaStore.java:140) at org.jsonschema2pojo.rules.PropertyRule.apply(PropertyRule.java:78) at org.jsonschema2pojo.rules.PropertyRule.apply(PropertyRule.java:41) at org.jsonschema2pojo.rules.PropertiesRule.apply(PropertiesRule.java:70) at org.jsonschema2pojo.rules.PropertiesRule.apply(PropertiesRule.java:38) at org.jsonschema2pojo.rules.ObjectRule.apply(ObjectRule.java:121) at org.jsonschema2pojo.rules.ObjectRule.apply(ObjectRule.java:66) at org.jsonschema2pojo.rules.TypeRule.apply(TypeRule.java:83) at org.jsonschema2pojo.rules.TypeRule.apply(TypeRule.java:38) at org.jsonschema2pojo.rules.SchemaRule.apply(SchemaRule.java:83) at org.jsonschema2pojo.rules.SchemaRule.apply(SchemaRule.java:38) at org.jsonschema2pojo.rules.PropertyRule.apply(PropertyRule.java:79) at org.jsonschema2pojo.rules.PropertyRule.apply(PropertyRule.java:41)

isaacwu666 avatar Aug 03 '22 07:08 isaacwu666

This is a very interesting use-case. At 373Kb this is probably the largest schema I have ever seen. I don't see any reason this should exhaust an 8Gb heap though, so there is likely some problem with us creating and storing an excessive number of objects somewhere.

It would be good to reduce the heap size (say 1GB) and create and create a heap dump. You can do this by setting:

-Xmx1g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp

joelittlejohn avatar Aug 06 '22 11:08 joelittlejohn

I think the problem is that for each object defined (in the json schema) SchemaStore caches the whole json schema definition (as baseSchema) as new instance which significantly increases the heap utilization. The ContentResolver each time it creates new instance of JSONNode object ( even though same URI is passed multiple times). Introducing a cache in the ContentResolver result in reusing the instance in the SchemaStore cache and therefore reduce the heap utilization. I think it should not have side-effects I think but I am not the expert here.

krystianekb avatar Sep 11 '22 19:09 krystianekb

Running on Java 17: -Xmx had to be set to ~ 426m to reproduce. Heap class histogram:

Class Name                                                                              |    Objects | Shallow Heap |  Retained Heap
-------------------------------------------------------------------------------------------------------------------------------------
java.util.LinkedHashMap$Entry                                                           |  3,100,216 |  124,008,640 | >= 438,750,232
byte[]                                                                                  |  2,331,985 |   81,777,168 |  >= 81,777,168
java.util.HashMap$Node[]                                                                |    707,996 |   57,110,816 | >= 437,324,728
java.lang.String                                                                        |  2,331,486 |   55,955,664 | >= 137,231,568
java.util.LinkedHashMap                                                                 |    707,815 |   39,637,640 | >= 439,328,256
com.fasterxml.jackson.databind.node.TextNode                                            |  2,309,079 |   36,945,264 | >= 172,478,832
com.fasterxml.jackson.databind.node.ObjectNode                                          |    705,071 |   16,921,704 | >= 438,585,072
java.lang.Object[]                                                                      |    220,387 |   13,968,536 | >= 114,421,624
java.util.ArrayList                                                                     |    219,781 |    5,274,744 | >= 117,785,896
com.fasterxml.jackson.databind.node.IntNode                                             |    321,997 |    5,151,952 |   >= 5,152,032
com.fasterxml.jackson.databind.node.ArrayNode                                           |    205,115 |    4,922,760 | >= 118,099,168
java.math.BigDecimal                                                                    |     16,631 |      665,240 |     >= 671,552
com.fasterxml.jackson.databind.node.DecimalNode                                         |     16,595 |      265,520 |     >= 929,464
java.util.HashMap                                                                       |      5,479 |      262,992 | >= 436,983,552
com.sun.codemodel.JInvocation                                                           |      5,801 |      232,040 |     >= 853,640
java.util.HashMap$Node                                                                  |      5,308 |      169,856 | >= 436,682,056
com.sun.codemodel.JOp$BinaryOp                                                          |      6,251 |      150,024 |     >= 490,408
java.util.concurrent.ConcurrentHashMap$Node                                             |      4,659 |      149,088 |     >= 328,456
com.sun.codemodel.JFieldRef                                                             |      4,186 |      133,952 |     >= 134,056
com.sun.codemodel.JMethod                                                               |      1,993 |      127,552 |   >= 2,704,416
java.net.URI                                                                            |      1,573 |      125,840 |     >= 677,064
int[]                                                                                   |        765 |      103,744 |     >= 103,744
com.sun.codemodel.JDocComment                                                           |      1,728 |       82,944 |     >= 547,280
sun.util.locale.LocaleObjectCache$CacheEntry                                            |      1,986 |       79,440 |      >= 79,440
com.sun.codemodel.JStringLiteral                                                        |      4,792 |       76,672 |      >= 76,672
com.sun.codemodel.JMods                                                                 |      4,501 |       72,016 |      >= 72,280
com.sun.codemodel.JAnnotationUse                                                        |      2,939 |       70,536 |     >= 581,432
com.sun.codemodel.JBlock                                                                |      2,809 |       67,416 |   >= 1,560,504
char[]                                                                                  |        248 |       66,408 |      >= 66,408
com.sun.codemodel.JAtom                                                                 |      3,321 |       53,136 |     >= 212,304
org.jsonschema2pojo.Schema                                                              |      1,508 |       48,256 | >= 438,301,584
...
org.jsonschema2pojo.SchemaStore                                                         |          1 |           24 | >= 436,404,448
org.jsonschema2pojo.SchemaMapper                                                        |          1 |           24 |       >= 4,448
...

It confirms @krystianekb hypothesis:

SchemaStore caches the whole json schema definition (as baseSchema) as new instance which significantly increases the heap utilization

However rather than introducing cache in the content resolver it should be possible to check whether schemas contains resolved baseId (URI without fragment) and if it does - take references to baseContent and baseSchema from cached schema instead of attempting to resolve them eg. replace: https://github.com/joelittlejohn/jsonschema2pojo/blob/9315b7b69417899f0addb1d795938ac802763566/jsonschema2pojo-core/src/main/java/org/jsonschema2pojo/SchemaStore.java#L60-L65 with

            URI baseId = removeFragment(id).normalize();
            final JsonNode baseContent;
            final Schema baseSchema;
            if (schemas.containsKey(baseId)) {
                baseContent = schemas.get(baseId).getContent();
                baseSchema = schemas.get(baseId).getParent();
            } else {
                baseContent = contentResolver.resolve(baseId);
                baseSchema = new Schema(baseId, baseContent, null);
            }

            if (normalizedId.toString().contains("#")) {

With solution above output could be generated using -Xmx9m (47x less memory) without getting OOM.

unkish avatar Jan 28 '23 12:01 unkish